I was asked by some peeps about how the multi-media pipeline (DirectShow) works and also how it relates to WPF. I've decided to write a little post to cover all this. I've tried to keep it as high level has possible.
DirectShow
DirectShow is the multimedia framework in Windows. Most media players that run on windows use DirectShow, including Windows Media Player. Ever wonder how you can install a DiVX codec and all of a sudden you can play those (legal I hope) DiVX AVI files? How does this witchcraft work?
DirectShow is built on top of COM, which is the "language de franc for Windows". DirectShow is also a plug-in type architecture, so developers can write code that fits snuggly into the framework. Lets start getting the nomenclature right. Plug-ins in DirectShow are called "filters". Filters are all contained in something called a "graph". There are three main types of filters that developers usually concern themselves with.
- Source Filters - These filters provide the source streams of data. This could be raw bytes from an audio file.
- Transform Filters - These "transform" data that is provided from other filter's output. This could be doing a transform such as adding text on top of video or uncompressing a MPEG frame.
- Renderer Filters - Pretty much like the name suggests. These filters render the data. Maybe it sends audio to the sound card. Maybe it draws video on the screen. Or maybe it even writes data to a file.
Ok, so now we see that DirectShow has this menagerie of filters. How do they talk to each other? Filters have something called "pins". A pin (to put trivially) is an input or output source for data to flow from one filter to another. Each pin can only connect to one other pin and they have to agree on what kind of data they are sending.
I think the best way to visually show what the hell I'm talking about is via a program called GraphEdit. GraphEdit is an application that comes with the Windows Platform SDK. It allows developers to test filter configurations without having to write any code. For my first example, here is a screen-shot of a DirectShow graph that plays a MPEG4 file.
Each blue square area is a DirectShow filter. The small little square gray dots are the pins. The arrows show connections between the pins and the direction of the data flow. DirectShow is smart enough to build this graph by itself just by querying each filter registered in the system to see if it will take a particular data type. It is also possible to manually build a graph yourself.
The filter labeled "Part 1 - The Shorts of the Cosmic Ocean.avi" is the source filter. It's output pin probably just dumps bytes to the "AVI Splitter" (A transform filter). The AVI splitter makes sense of the raw bytes and splits the output into two separate streams, one video and one audio.
In this particular case, the video pin (Stream 00) of the AVI Splitter pumps out individual MPEG4 (aka DiVX) frames to another transform filter I have installed, called "FFDShow". FFDShow then decodes the MPEG4 samples and passes the uncompressed video frames to the "Video Renderer" filter to be displayed on the screen and when we press "Play", we get this:
DirectShow can be used for more than just playback, it can also be used for encoding, transcoding or network transmission of media. For instance this graph will take the audio input of my computer and write it to a WMA file.
How does this relate to WPF?
WPF has a control called MediaElement that plays video files. Essentially you provide a source Uri (ie http://server/file or c:\myfile.avi) and MediaElement plays it. Pretty simple huh? MediaElement accomplishes this by a little help from it's buddy, Windows Media Player [OCX]. WMP is a DirectShow based player (also MediaFoundation in Vista), so it mostly plays by DShow's rules.
When MediaElement/WMP/DirectShow is told to play a url such as, "http://server/file", DShow will search the registry for a "http" protocol handler. If found, the registry will provide the GUID of the COM object for the filter. DShow then looks up the GUID in the registry and finds the DLL file associated to the filter.
Once the filter is loaded and added to the graph, DirectShow passes the http://server/file string to the source filter. The source filter then will do "what it needs to" in order to figure out what type of output pins it needs to make. Once the output pins are created, DirectShow will query the pins to find out what type of "media" they are and constructs the graph with the correct filters in order to render it.
What implications does this have?
The great thing about this protocol handler, is that you can create your own. For instance, I can make a protocol handler called "MySuperDuperProtocol://", and when WPF's MediaElement is instructed to load up the Uri "MySuperDuperProtocol://Something/Other", it will load up my own custom DirectShow source filter I have associated to it.
If you can make your own custom source filter and use it in WPF, why did you waste your time making the VideoRendererElement?
Good question. Just because WPF uses your custom source filter, doesn't mean WPF will let you talk to it. And it doesn't mean you have control over the DirectShow graph that it constructs. The VideoRendererElement gives a developer full control over any DirectShow graph they wish to create, along with just being able to do fast pixel updates or GDI based updates.
Hope this clears things up!
-Jer