As human persons who are not robots we spend a good chunk of our days looking at things. As human persons in 2017, those things tend to be screens. We make a lot of stuff to put on screens, and then the stuff gets watched on screens by many others. For the past couple of years I have been working in an environment that deals with the unmentioned bit that goes in the middle: making things appear on the screen.
A complaint I get a lot from our viewers is that the quality of what we put on their screens is atrocious. Not because the content is bad, but because the platform is bad. And all I can say is that they are right.
Now I work with a lot of people, a lot of whom are smart. Many of them have been doing this far longer than I have and are way more stuck in the technical details than I, most likely, will ever be. Still, I couldn’t help but wonder “what if …”.
A lot of the constraints on the platform are technical, and quite a few of them are non-technical. Of course there is a significant business interest from studios and production houses in not making all of their content available online, mostly DRM-free, in high-definition broadcast quality. Then who would buy the BluRay? (Hint: nobody, and it’s all your fault)
Another technical, yet non-technical, constraint is that we want just about anyone, anywhere, to be able to view the content. This can be tricky because not all the browsers and devices support the newer open standards that make video distribution painless (I’m looking at you, Apple).
Still, I wanted to see how far I could get on a Sunday morning setting up a streaming video that supports most modern webbrowsers, that does not violate the non-technical constraints and that has the option of DRM. I have suspected for a while that all the above is possible, if unhindered by process, standups, and timekeeping. So, unencumbered with much technical knowledge of the subject matter, I went looking at what’s what in video streaming land.
Firstly, video these days needs to be of the adaptive bitrate variety; i.e., devices on a slow network get shitty quality, devices on a faster network get less shitty quality, and so on. This is not an optional feature in the world of unlimited 3G data and hotel WiFi.
I’d heard various terms like HLS, and Smooth Streaming tossed around the office. Some apparently work well with Apple, and some work well with not-Apple. Another term I’d heard was MPEG-DASH. That was the first thing I looked up on Wikipedia because I’d heard of it least. Immediately it triggered a “yass” response:
MPEG-DASH is the first adaptive bit-rate HTTP-based streaming solution that is an international standard. […] Unlike HLS, HDS, and Smooth Streaming, DASH is codec-agnostic, which means it can use content encoded with any coding format like H.265, H.264, VP9 etc.
Dope. Open standard and codec agnostic. That should fit in my budget of $0, and not give me much hassle selecting a codec.
Now then to find a video codec that meets my needs. I know that Netflix do many things well. One of which is delivering high-definition content to my TV via the world’s crappiest ADSL2+ line. The top result on Google for “netflix codec” was an article on the Variety website entitled Netflix Starts Using VP9 Codec, Saving Up to 36% of Bandwidth. I had read this article when it came out, but it didn’t stick in my memory.
Of course Netflix undoubtedly has many, many tricks up its billion-dollar sleeves to compress video much better than I ever could, but checking out VP9 might be worth the hassle. After spending some time in the Heckler & Koch sinkhole, I read the following on the Wikipedia:
VP9 is an open and royalty free video coding format developed by Google. […] In contrast to HEVC, VP9 support is common among web browsers. The combination of VP9 video and Opus audio in the WebM container, as served by YouTube, is supported by roughly ¾ of the browser market
Sweet. That’s many browsers. Or probably just Chrome, Edge, and Firefox. In any case, another $0 open standard nailed down.
I remember Opus from my time running a Mumble service for an MMO group. It’s a decent quality audio codec with good compression. Audio is not super relevant for this experiment, so Opus it is.
Much to my surprise the ancient version of ffmpeg I had installed supported both VP9 and Opus. A-transcoding we will go. I “borrowed” a video file from work. Sadly, we don’t get broadcast-quality MXF material to work with so I grabbed one the smallest current VoD assets I could find. A 3 minute 12Mbit H.264 file with a cartoon extolling the virtues of some political party I had never heard of.
Easy things first, let’s demux and transcode the audio:
A quick listen and pass with ffprobe shows this is probably fine:
$ ffprobe audio.webm [...] Duration: 00:02:59.98, start: -0.007000, bitrate: 104 kb/s
That’s the expected duration anyway, and I hear sounds. So yay.
Now, let’s create the video files:
ffmpeg -i POW_03384262_H264_12000.mpg -c:v libvpx-vp9 -keyint_min 150 \ -g 150 -tile-columns 4 -frame-parallel 1 -f webm -dash 1 \ -threads 4 \ -an -vf scale=160:190 -b:v 250k video_160x90_250k.webm \ -an -vf scale=320:180 -b:v 500k video_320x180_500k.webm \ -an -vf scale=848:480 -b:v 750k video_840x480_750k.webm \ -an -vf scale=1280:720 -b:v 1000k video_1280x720_1000k.webm
Astute readers will notice I done fucked up the aspect ratio on the 250k variant, and the filename on the 480p one. Oh well. It took me until after transcoding (which was painfully slow) to realize that mistake.
Do I know what all that stuff means? Nope, I just stole it from a helpful Mozilla website. We’ll assume the defaults are fine.
This created files with … interesting filesizes:
-rw-rw-r-- 1 cris cris 16M May 21 14:44 video_160x90_250k.webm -rw-rw-r-- 1 cris cris 28M May 21 14:44 video_320x180_500k.webm -rw-rw-r-- 1 cris cris 58M May 21 14:44 video_840x480_750k.webm -rw-rw-r-- 1 cris cris 24M May 21 14:44 video_1280x720_1000k.webm
Oddly, the video_1280x720_1000k.webm file was the only one sticking to its assigned bitrate. The others have bitrates of respectively 724kb/s, 1273kb/s, and a whopping 2663kb/s. Mind you, this bitrate would be perfectly acceptable for streaming today but there is a reason I wanted to limit at 1Mbit.
Of course, I knew that specifying just the
-b:v option would give me a variable bitrate, but I didn’t know it would be that variable. Back to the encoding board, and this time, let’s go with constant bitrates. Ideally, you’d probably want at least your high-quality stream to be a high CRF value instead of CBR like an idiot, but, like I said earlier, constraints.
Some fiddling with ffmpeg later and I had a postage-stamp sized CBR 250k file. Who needs pixels anyway?
Duration: 00:02:59.92, start: 0.000000, bitrate: 228 kb/s
Close enough. Let’s start this party up again:
ffmpeg -i POW_03384262_H264_12000.mpg -c:v libvpx-vp9 -keyint_min 150 \ -g 150 -tile-columns 4 -frame-parallel 1 -f webm -dash 1 \ -threads 4 \ -an -vf scale=160:90 -b:v 250k -minrate 250k -maxrate 250k \ -bufsize 64k video_160x90_250k.webm \ -an -vf scale=320:180 -b:v 500k -minrate 500k -maxrate 500k \ -bufsize 64k video_320x180_500k.webm \ -an -vf scale=848:480 -b:v 750k -minrate 750k -maxrate 750k \ -bufsize 64k video_848x480_750k.webm \ -an -vf scale=1280:720 -b:v 1000k -minrate 1000k -maxrate 1000k -bufsize 64k video_1280x720_1000k.webm
Before being able to continue, apparently we have to “Align the clusters to enable switching at cluster boundaries.” That’s fairly easily done with
mkvmuxer_sample -i audio.webm -o audio-final.webm \ -output_cues 1 -cues_on_audio_track 1 -max_cluster_duration 2 \ -audio_track_number
The question is of course “why.” Reading up a bit, it seems WebM is a subset of Matroska (MKV). In MKV cues refer to clusters. And the description of clusters, according to the mkvtoolnix wiki is:
In Matroska a cluster is a unit that contains a number of frames of all track types. A file is often made up of hundreds if not thousands of clusters. Normally the “cues” provide enough information for a player in order to find the cluster that contains a certain key frame a player wants to seek to. Some players also need the clusters themselves to be referenced in the “meta seek elements”.
So, essentially, they’re indices referring to your actual video and audio data. Cool. So that should really mean that we don’t try to switch bitrates midway through playing a cluster, or chunk, of a stream. Seems sensible.
Let’s do the same for the video files then:
mkvmuxer_sample -i video_160x90_250k.webm -o video_160x90_250k-final.webm mkvmuxer_sample -i video_320x180_500k.webm -o video_320x180_500k-final.webm mkvmuxer_sample -i video_848x480_750k.webm -o video_848x480_750k-final.webm mkvmuxer_sample -i video_1280x720_1000k.webm -o video_1280x720_1000k-final.webm
A more in-depth read shows the reason we can’t leave it to ffmpeg is:
Anyway, the filesizes looks a lot better, and it turns out the bitrates are what we expect. The 720p variant even looks good too:
Now it’s finally time to build the manifest!
webm_dash_manifest -o manifest.mpd \ -as id=0,lang=dut \ -r id=0,file=video_160x90_250k-final.webm \ -r id=1,file=video_320x180_500k-final.webm \ -r id=2,file=video_848x480_750k-final.webm \ -r id=3,file=video_1280x720_1000k-final.webm \ -as id=1,lang=dut \ -r id=4,file=audio-final.webm
This came back with:
Warning profile is WebM On-Demand and AdaptationSet id:0 does not have subSegmentAlignment.
Well balls, that’s probably bad. Some fiddling later and it looks like video_160x90_250k-final.webm is the culprit. Without it, subsegment alignment appears to be fine. I don’t know why, but I can imagine the hilarious lack of pixels may have something to do with ffmpeg not being able to align its bits properly on its bytes. So I’m just going to leave that file out. Didn’t want it anyway.
In case you’re not familiar with manifest files, they are XML files that look somewhat like this:
<Period id="0" start="PT0S" duration="PT179.981S" > <AdaptationSet id="0" mimeType="video/webm" codecs="vp9" lang="dut" subsegmentAlignment="true" subsegmentStartsWithSAP="1" bitstreamSwitching="true"> <Representation id="0" bandwidth="490835" width="320" height="180"> <BaseURL>video_320x180_500k-final.webm</BaseURL> <SegmentBase indexRange="10751322-10751923"> <Initialization range="0-262" /> </SegmentBase> </Representation>
Basically, it describes the stream and what bitrates are available. You’ll notice that a
SegmentBase is provided. That points to the cues that we added earlier and allows the player to know where to find which clusters.
Of course, having individual clusters is useless if your webserver is naive and only supports linear downloads. Fortunately, I’m on nginx which tends to be not terrible in terms of features that Just Work.
Cool. So we’re all set up server-side. Now to get the player. My first try was using the dash.js reference player. This player worked well with my newly created manifest and files … except on Safari. But we knew that could be problematic with the codec used.
Nope. Okay, no big deal. I had already made a 1Mbit H.264 file with an AAC audio track because I was expecting this. video.js lets you add a second source and it will fall back on that. So there we go: Safari works too! It would probably be an option to implement HLS here to also support adaptive streaming, but I’m well into the afternoon now and that’s really enough.
Sadly, playback on neither Safari/iOS nor Chrome/Android seems to work. The good news is that these should be relatively easy to get going, as long as you feed them the correct format of files.
And now for the final comparison of quality. The VP9 file with all its terrible default settings:
Compared to its highly optimized commercial-grade H.264 counterpart:
Honestly, I’d say we have a pretty clear winner in the quality category here. And the VP9 file has the added benefit of being 22MiB, whereas the H.264 file is 28MiB. So it’s 22% smaller, before optimizing. That said, transcoding my source material to VP9 was significantly slower than transcoding to H.264.
Of course, those pesky mobile devices would still need work. As would outdated browsers, “smart” TVs, etc. It would be a significant investment in infrastructure to roll out something like MPEG-DASH/VP9 correctly.
Is it worth it? I guess that depends on if you believe whether this “Internet” the kids use these days could catch on …
Cris van Pelt