Difference between revisions of "Development: The DVB Decoder Challenge"

From LinuxTVWiki
Jump to: navigation, search
m (Updated deinterlace_bottom_field() link (CVS replaced by SVN).)
 
(29 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Introduction ==
+
One will encounter several challenges over the course of implementing either a hardware or software based [[MPEG-2 Decoder]].  Most of these hurdles are not too hard to solve, but they are very DVB-specific and can be quite annoying if not properly handled.
  
When implementing a Hard- or Software [[MPEG2 Decoder]] one will encounter several challenges, most of them are not too hard to solve but very DVB-specific and can be quite annoying if not properly handled.
+
This page attempts to list some of these common obstacles that a developer may encounter, as well as provide discussion on common approaches around them and outline elegant solutions.
  
So this page tries to list them, to discuss common approaches and to outline elegant solutions.
+
== The 0x47 MPEG frame Aligner ==
 +
 
 +
Most DVB cards don't output the MPEG-2 data aligned to 188byte or 204byte MPEG-2 packet boundaries: this would be hard to achieve on most hardware since DMA controllers are much simpler and cheaper to implement if they can rely on some alignment assumptions.
 +
 
 +
So the incoming data needs to get aligned to packet boundaries, this is achievable by a simple state machine, see the [[MPEG-2 Frame Aligner]] article for a detailled discussion and some sample code.
 +
 
 +
 
 +
== PID Filtering/Demultiplexing ==
 +
 
 +
The incoming [[MPEG-2 Transport Stream]] contains MPEG-2 packets of all transmitted services on this transponder, so we need to peel out the ones that are interesting for us. See the [[PID Filter - Demultiplexer]] section for details and code.
  
  
 
== The STC sync problem ==
 
== The STC sync problem ==
  
Whenever a client has to decode a live stream from a server it has to adjust it's own system time clock to the one of the server, for several reasons:  
+
Whenever a client has to decode a live stream from a server it has to adjust its own STC (System Time Clock) relative to the one of the server, for several reasons:  
  
 
* Transmitted data is bursty, the decoder has to display content with a little delay. This delay should get minimized, otherwise you always hear your neighbors celebrating the soccer championship goal 3 seconds before you can see it.
 
* Transmitted data is bursty, the decoder has to display content with a little delay. This delay should get minimized, otherwise you always hear your neighbors celebrating the soccer championship goal 3 seconds before you can see it.
Line 14: Line 23:
 
* The server clock may run continously faster or slower than the host clock, so the time difference may increase with time
 
* The server clock may run continously faster or slower than the host clock, so the time difference may increase with time
  
The Solution is the [[PCR]] (Program Clock Reference) a special clock reference value transmitted every few MPEG2 TS packets in the TS packet header. This reference allows the client to synchronize it's own clock to the one of the server. Hardware MPEG2 decoders use voltage-controlled oscillators or numerically controlled oscillators for this purpose.
+
The Solution is the [[PCR]] (Program Clock Reference): a special clock reference value transmitted every few MPEG-2 TS packets in the TS packet header. This reference allows the client to synchronize it's own clock to the one of the server. Hardware MPEG-2 decoders use voltage-controlled oscillators or numerically controlled oscillators for this purpose.
  
 
Software decoders followed different approaches in the past:
 
Software decoders followed different approaches in the past:
Line 20: Line 29:
 
* naive codes just watch the buffer fuel level and drop or delay frames.
 
* naive codes just watch the buffer fuel level and drop or delay frames.
  
* VLC low-pass filters the incoming clock references and uses a linear approximation algorithm to approach the server clock reference.
+
* VLC ([http://www.videolan.org/ VideoLAN Client]) low-pass filters the incoming clock references and uses a linear approximation algorithm to approach the server clock reference. This works pretty well, but unfortunately all the timing code is very VLC-specific and not easy to reuse.
 
+
* A theoretically a little hard to understand but very efficient and easy to implement approach uses Kalman Filtering.
+
  
 +
* A theoretically a little harder to understand but very efficient and trivial to implement approach uses [[Kalman Filtering for STC/PCR smoothing]].
  
 
== Audio/Video Sync ==
 
== Audio/Video Sync ==
Line 29: Line 37:
 
You should ensure that Audio and video frames are correctly presented to the user at the System Clock Time encoded in the frame's PTS (Presentation Time Stamp).
 
You should ensure that Audio and video frames are correctly presented to the user at the System Clock Time encoded in the frame's PTS (Presentation Time Stamp).
  
The STC should get synchronized regularily to the server clock using the PCR. For recorded playback you can either use the host clock, the video frame sync or the audio crystal as clock reference to sync.
+
The STC should get synchronized regularily and smoothly to the server clock using the PCR. For recorded playback you can either use the host clock, the video frame sync interrupt or the audio crystal as clock reference.
  
 +
== Audio Clock pitching ==
 +
 +
Since the sample rate of most soundcards can't get smoothly adjusted while playing it may be required to resample the audio signal in software before sending it to the sound card. Naive nearest-neighbor or sample-drop approaches are trivial to implement, even linear filtering costs only a few lines of code. Most audio and decoder libraries have resampling routines built in, there are also resampling libraries available on the net.
 +
 +
ISO/IEC13818-1 allows maximum clock rate changes of 0.075Hz/sec in order to avoid audible artifacts in audio playback.
  
 
== Screen/Decoder Sync Aliasing ==
 
== Screen/Decoder Sync Aliasing ==
  
Unless the Display Refresh rate is at least twice as high as the frame rate of the displayed video you will get aliasing artifacts (jerking video due to dropped or double frames). See [[Wikipedia:Nyquist]] and [[Wikipedia:Nyquist-Shannon_sampling_theorem]] for a overview over sampling theory and a short explanation why aliasing artifacts occur when you sample at rates below the Nyquist frequency.
+
Unless the Display Refresh rate is at least twice as high as the frame rate of the displayed video you will get aliasing artifacts (jerky video due to dropped or double frames). See [[Wikipedia:Nyquist]] and [[Wikipedia:Nyquist-Shannon_sampling_theorem]] for a overview over sampling theory facts and a short explanation why aliasing artifacts occur when you sample at rates below the Nyquist frequency.
  
 +
So it's best for video decoder systems to either maintain the exact framerate of the encoded material or, if this is not possible (like e.g. in the case of a decoder displaying to your computer monitor) try to keep the display refresh frequency at least twice as high as the frame rate of the decoded video.
  
== Audio Clock pitching ==
+
In any case you should only update the screen content in the video refresh interval, use the e.g. the Sync Extension of OpenGL or the appropriate Blit-Wait flag of DirectFB in your implementation.
  
Since the sample rate of most soundcards can't get smoothly adjusted while playing it may be possible to resample the audio signal in software before sending it to the sound card. Naive nearest-neighbor or sample-drop approaches are trivial to implement, even linear filtering costs only a few lines of code. Most audio libraries have resampling routines built in, there are also resampling libraries available on the net.
 
  
ISO/IEC13818-1 allows maximum clock rate changes of 0.075Hz/sec in order to avoid audible artifacts in playback.
+
== Use Texture Units or Overlay? ==
  
 +
Modern graphics hardware usually provides two different approaches commonly used for video display. The 'classic' one is to use the video overlay plane for rendering and scaling (basically a second framebuffer over the normal one, combined by colorkeying or in some cases alphablending). This approach has several problems, different cards require different rendering algorithms and some don't provide double-buffered rendering. Only Scaling and in some cases Color Conversion are hardware-accelerated, Deinterlacing usually not. The common API to access these features of a graphics card for X11 is the Xv Extension.
 +
 +
The more flexible and robust approach is to use the texture engines of the graphics card using e.g. the OpenGL API. These render paths are well-tested and exercised by 3D-computer games, all cards since the old 3Dfx Voodoo cards can get accessed using the same API. Texture combine operations can get used to perform color conversion, deinterlacing in hardware. Interpolating Scaling is done inherently by the texture units. Even nonlinear picture distortion (e.g. for mapping of 4:3 pictures on 16:9 or 16:10 displays) can get easily implemented by modifying the texture coordinates on the rendering rectangle.
 +
 +
 +
== Deinterlacing ==
 +
 +
There are plenty of deinterlacing algorithms known, even simple blend filters (like the one implemented [http://ffmpeg.sf.net ffmpeg]'s libavcodec) can perform very well. A more serious problem is that many deinterlacers are top-field-only (or bottom-field-only) and degrade the frame rate from 50Hz (interlaced) to 25Hz (progressive). This may look fine and cinema-alike when watching Hollywood movies but makes scrolling text (e.g. credits and newstickers) jerky and hardly readable.
 +
 +
The correct approach to preserve full temporal resolution is to deinterlace both fields, the even and the odd ones (each blended with the inbetween fields from the previously displayed frame).
 +
 +
In order to use ffmpeg's deinterlacer you would need to implement a matching deinterlace_top_field() function in addition to the existing
 +
[http://svn.mplayerhq.hu/ffmpeg/trunk/libavcodec/imgconvert.c?revision=14513&view=markup deinterlace_bottom_field()].
 +
 +
When using OpenGL the Deinterlacer can get implemented completely on the graphics card. Enable the multitexturing engines: use one texture unit for the previous frame, one for the blend grid (where the lines have e.g. alternating alpha=0.9 and alpha=0.25 to simulate the decaying phosphor glow), and one for the new frame. Be sure to offset the texture coordinates so that the correct field from the previous frame shines through the gridlines. If the graphics card has not enough texture units available you can let it do the work in multiple passes.
 +
 +
 +
== Downscaling ==
 +
 +
Upscaling is usually simple. Especially when displaying HDTV transmissions in small windows on the desktop or on the SDTV screen you need to downscale by factors less than 0.5, this is a little harder if you want to avoid artifacts. You need to use either convolution filters with very long taps or, better: downscale in several steps. The image pyramid approach works fine:
 +
 +
* Downscale by a factor of 2 using linear interpolation filters until you reach a resolution less than twice the target resolution. Every step averages 4 neighbor pixels into a single pixel on the next smaller level.
 +
 +
* Now scale, again using linear interpolation filters, down to the target resolution (this scale factor is somewhere in the range [0.5...1.0] and thus scaling not susceptible to aliasing).
 +
 +
This algorithm can get implemented completely in hardware on the graphics card using OpenGL: simply enable linear filtering, render to texture and scale by 0.5 until you reached the last but one level and then render your texture into the framebuffer with the final scale correction somewhere between [0.5...1.0].
  
 
== Color Correction, the Gamma Question ==
 
== Color Correction, the Gamma Question ==
  
Computer Monitors and Video Projectors have a different Gamma Curve than Television Screens. Thus you need to apply a proper correction curve to the display. All common graphics libraries like SDL, DirectFB and SDL provide an API to set up the Gamma Color Lookup Tables. Not hard to do, just has to be done correctly otherwise you risk weak colors on the display.
+
Computer Monitors and Video Projectors have a different Gamma Curve than Television Screens. Thus you need to apply a proper correction curve to the display. All common graphics libraries like OpenGL, SDL, DirectFB and SDL provide an API to set up the Gamma Color Lookup Tables. Not hard to do, just has to be done correctly otherwise you risk weak colors on the display.
 +
 
 +
 
 +
[[Category:Development]]

Latest revision as of 14:05, 30 August 2008

One will encounter several challenges over the course of implementing either a hardware or software based MPEG-2 Decoder. Most of these hurdles are not too hard to solve, but they are very DVB-specific and can be quite annoying if not properly handled.

This page attempts to list some of these common obstacles that a developer may encounter, as well as provide discussion on common approaches around them and outline elegant solutions.

The 0x47 MPEG frame Aligner

Most DVB cards don't output the MPEG-2 data aligned to 188byte or 204byte MPEG-2 packet boundaries: this would be hard to achieve on most hardware since DMA controllers are much simpler and cheaper to implement if they can rely on some alignment assumptions.

So the incoming data needs to get aligned to packet boundaries, this is achievable by a simple state machine, see the MPEG-2 Frame Aligner article for a detailled discussion and some sample code.


PID Filtering/Demultiplexing

The incoming MPEG-2 Transport Stream contains MPEG-2 packets of all transmitted services on this transponder, so we need to peel out the ones that are interesting for us. See the PID Filter - Demultiplexer section for details and code.


The STC sync problem

Whenever a client has to decode a live stream from a server it has to adjust its own STC (System Time Clock) relative to the one of the server, for several reasons:

  • Transmitted data is bursty, the decoder has to display content with a little delay. This delay should get minimized, otherwise you always hear your neighbors celebrating the soccer championship goal 3 seconds before you can see it.
  • The server clock may run continously faster or slower than the host clock, so the time difference may increase with time

The Solution is the PCR (Program Clock Reference): a special clock reference value transmitted every few MPEG-2 TS packets in the TS packet header. This reference allows the client to synchronize it's own clock to the one of the server. Hardware MPEG-2 decoders use voltage-controlled oscillators or numerically controlled oscillators for this purpose.

Software decoders followed different approaches in the past:

  • naive codes just watch the buffer fuel level and drop or delay frames.
  • VLC (VideoLAN Client) low-pass filters the incoming clock references and uses a linear approximation algorithm to approach the server clock reference. This works pretty well, but unfortunately all the timing code is very VLC-specific and not easy to reuse.

Audio/Video Sync

You should ensure that Audio and video frames are correctly presented to the user at the System Clock Time encoded in the frame's PTS (Presentation Time Stamp).

The STC should get synchronized regularily and smoothly to the server clock using the PCR. For recorded playback you can either use the host clock, the video frame sync interrupt or the audio crystal as clock reference.

Audio Clock pitching

Since the sample rate of most soundcards can't get smoothly adjusted while playing it may be required to resample the audio signal in software before sending it to the sound card. Naive nearest-neighbor or sample-drop approaches are trivial to implement, even linear filtering costs only a few lines of code. Most audio and decoder libraries have resampling routines built in, there are also resampling libraries available on the net.

ISO/IEC13818-1 allows maximum clock rate changes of 0.075Hz/sec in order to avoid audible artifacts in audio playback.

Screen/Decoder Sync Aliasing

Unless the Display Refresh rate is at least twice as high as the frame rate of the displayed video you will get aliasing artifacts (jerky video due to dropped or double frames). See Wikipedia:Nyquist and Wikipedia:Nyquist-Shannon_sampling_theorem for a overview over sampling theory facts and a short explanation why aliasing artifacts occur when you sample at rates below the Nyquist frequency.

So it's best for video decoder systems to either maintain the exact framerate of the encoded material or, if this is not possible (like e.g. in the case of a decoder displaying to your computer monitor) try to keep the display refresh frequency at least twice as high as the frame rate of the decoded video.

In any case you should only update the screen content in the video refresh interval, use the e.g. the Sync Extension of OpenGL or the appropriate Blit-Wait flag of DirectFB in your implementation.


Use Texture Units or Overlay?

Modern graphics hardware usually provides two different approaches commonly used for video display. The 'classic' one is to use the video overlay plane for rendering and scaling (basically a second framebuffer over the normal one, combined by colorkeying or in some cases alphablending). This approach has several problems, different cards require different rendering algorithms and some don't provide double-buffered rendering. Only Scaling and in some cases Color Conversion are hardware-accelerated, Deinterlacing usually not. The common API to access these features of a graphics card for X11 is the Xv Extension.

The more flexible and robust approach is to use the texture engines of the graphics card using e.g. the OpenGL API. These render paths are well-tested and exercised by 3D-computer games, all cards since the old 3Dfx Voodoo cards can get accessed using the same API. Texture combine operations can get used to perform color conversion, deinterlacing in hardware. Interpolating Scaling is done inherently by the texture units. Even nonlinear picture distortion (e.g. for mapping of 4:3 pictures on 16:9 or 16:10 displays) can get easily implemented by modifying the texture coordinates on the rendering rectangle.


Deinterlacing

There are plenty of deinterlacing algorithms known, even simple blend filters (like the one implemented ffmpeg's libavcodec) can perform very well. A more serious problem is that many deinterlacers are top-field-only (or bottom-field-only) and degrade the frame rate from 50Hz (interlaced) to 25Hz (progressive). This may look fine and cinema-alike when watching Hollywood movies but makes scrolling text (e.g. credits and newstickers) jerky and hardly readable.

The correct approach to preserve full temporal resolution is to deinterlace both fields, the even and the odd ones (each blended with the inbetween fields from the previously displayed frame).

In order to use ffmpeg's deinterlacer you would need to implement a matching deinterlace_top_field() function in addition to the existing deinterlace_bottom_field().

When using OpenGL the Deinterlacer can get implemented completely on the graphics card. Enable the multitexturing engines: use one texture unit for the previous frame, one for the blend grid (where the lines have e.g. alternating alpha=0.9 and alpha=0.25 to simulate the decaying phosphor glow), and one for the new frame. Be sure to offset the texture coordinates so that the correct field from the previous frame shines through the gridlines. If the graphics card has not enough texture units available you can let it do the work in multiple passes.


Downscaling

Upscaling is usually simple. Especially when displaying HDTV transmissions in small windows on the desktop or on the SDTV screen you need to downscale by factors less than 0.5, this is a little harder if you want to avoid artifacts. You need to use either convolution filters with very long taps or, better: downscale in several steps. The image pyramid approach works fine:

  • Downscale by a factor of 2 using linear interpolation filters until you reach a resolution less than twice the target resolution. Every step averages 4 neighbor pixels into a single pixel on the next smaller level.
  • Now scale, again using linear interpolation filters, down to the target resolution (this scale factor is somewhere in the range [0.5...1.0] and thus scaling not susceptible to aliasing).

This algorithm can get implemented completely in hardware on the graphics card using OpenGL: simply enable linear filtering, render to texture and scale by 0.5 until you reached the last but one level and then render your texture into the framebuffer with the final scale correction somewhere between [0.5...1.0].

Color Correction, the Gamma Question

Computer Monitors and Video Projectors have a different Gamma Curve than Television Screens. Thus you need to apply a proper correction curve to the display. All common graphics libraries like OpenGL, SDL, DirectFB and SDL provide an API to set up the Gamma Color Lookup Tables. Not hard to do, just has to be done correctly otherwise you risk weak colors on the display.