I was sitting in a Montclair coffee shop in the fall of 2014, learning how to work with MPEG-DASH while I built out a proof of concept live streaming platform. I saw how quickly consumer entertainment apps were shifting focus to video and it seemed important to know how to work with the surrounding technologies.
Next to me were a couple of other programmers, working on an actual video product that generated movie credits for films and television using a web based app. I was curious and they told me ending credits usually got stuck with the video editors or they are outsourced to a third party.
With their app, you could generate credits from a set of templates and export to any video format in a fraction of the time and cost. I thought this was a really novel SaaS product.
My goal for the project was simple. I wanted to broadcast video from my computer to a remote server and serve it to a few clients both on desktop and mobile.
Conceptually it’s simple. Video software sends video data to a server, the server copies it to a storage location, and a CDN distributes it to viewers. But there’s a bunch of stuff going on behind the scenes.
When video data is sent to a server it's uploaded in chunks. Each one of those chunks are then encoded into several different sizes and bit rates so a client requesting it gets exactly what they need for a good experience.
For example if you’re watching video from a desktop on a great internet connection, a video player will download some high quality 1920x1080p video. But if your internet connection dips for a few minutes, the video player will detect bandwidth issues and automatically switch to 720x480p. This is called adaptive bit rate.
I learned through testing that transcoding for adaptive bitrate is a very expensive optimization. Transcoding video is extremely CPU intensive, which is why cloud transcoding services use GPU dedicated servers optimized for video processing. Workloads are sharded over racks of dedicated workers and your broadcasts are scaled horizontally based on demand.
There was definitely no way I’d be able to support transcoding for more than a few users at a time or adaptive bit rate on my little test server.
After the video is transcoded it gets stored and served. Storage is cheap, thankfully. But you don’t want to serve video from the same storage location because it will add latency in the clients requests. Instead you front the storage location with a CDN so it’s served closer geographically, cutting way down on latency.
In my demo I kept things simple. I transcoded, stored, and served the video all from the same server. It was interesting to learn how all the pieces came together but it wasn't worth the time or effort to build something for Web Scale™ when I was going to scrap it afterwards.
The final step of delivery is playback. For this I needed a video player. I was working with a x264 codec which at the time, was cutting edge. My choices for video players that supported that codec were limited. I tried a few but went with JW Player.
I found it really challenging to debug live video playback locally. My workstation could barely handle broadcasting, transcoding, storing, and serving video all at once. I experienced a lot of performance issues, which caused delays or interruptions in any one of the processes I was hosting.
For example if the transcoding lagged, then my video player requesting the next chunk would think it wasn’t available yet or the stream had ended and the playback would freeze.
At the time I found it difficult to tune the internal behaviors of video players to be more tolerant of network hiccups. I could never really get smooth video working on my local machine for more than 30 seconds at a time before it buckled in on itself.
But when I hosted the transcoding on a remote server it worked fine. I remember thinking it was the coolest thing ever that I could broadcast a live stream to my test website on my own infrastructure. I was able to serve 5 or so clients before it crashed and burned, but it was still really fun to see it work.
The biggest lesson I learned from this project is that if you want to build a live video product you better have a ton of cash behind you to pay for the gap in monetization. It would likely be impossible to bootstrap a business that relied heavily on live video because of the upfront capital required to grow your audience large enough for advertisers to be interested.