HLS streaming, keyframes, scene-cut & GOP
Taking a stab at .ts segments, variable GOP length, scene-cut and keyframes for streaming to ABR platforms.
If you've ever tried setting up your own streaming platform or server and involved HLS streaming in the process, or even just tried messing around a bit with streaming software and tried to stream to HLS platforms, you might have ran into some quirks regarding buffering as a result of inconsistent segmenting. These quirks can result in buffering for the viewers, and can cause general incomparability with certain players. It can potentially also mess with transcodes (converting to lower resolutions/bitrate), often known as ABR (adaptive bitrate streaming).
This will mainly be focused on h264 (libx264 and nvenc), but most these concepts will transfer or be somewhat applicable to other codecs and encoders.
The HLS segmenters job is to turn a stream into smaller files (.ts segments), and add these to the index (.m3u8 file). These segments cant be too big, as that increases latency, and they should also be somewhat consistent in length and size, for the sake of ABR doing its job, and for the viewers/players knowing if it can handle the current stream or not.
The way it decides where to "slice" the video and turn it into a new segment is based on a couple of things. In the configuration file of the segmenter you can specify the maximum fragment length, but this could also be set to adapt. If you exceed this, it will cause issues/stalls.
Impact #2 comes from the fact that the segmenter shouldn't just slice wherever it wants, as it should to be on an I-frame (This might be only on IDR, but some implementations might potentially slice on any I-frame). These two factors together well determine the length (and impact size) of the .ts fragments.
A common mindset is that we have a fixed gop length of 2 seconds (2 sec keyframe interval). This is a nice middle ground between load times, latency and optimization/efficiency ultimately impacting quality.
Keyframe insertion & GOP
Most platforms highly recommend using a fixed GOP length (strict), meaning that IDR frames would be at a set interval, and that the GOP length would be consistent. This is for the reasons mentioned earlier, smooth transitions between ABR streams, consistent timing and size the segments, as well as predictable load on segmenter, CDNs and clients (viewers).
For h264, which we are primarily focused on, I cant really recommend going down the variable GOP length route. It might be tempting, but know that its a lot of work and frustration, and a lot of platforms wont support this properly, if at all. Do as they recommend, or you're asking for issues, and you need to be prepared to spend a lot of time working out the small details, and its probably not worth doing.
Its true that we could theoretically achieve better compression, resulting in a higher quality stream if we didn't have to worry about these limitations, and make full use of scene-cut and flexible GOP length, and if you happen to run your own server/platform, feel free to experiment with it, but it's not going to be an easy and straight forward thing, most likely. Scene-cut could be somewhat feasible, flexible GOP length might be a bit more annoying. This also may also depend on the HLS implementation, and how they split segments.
This function allows the encoder to insert I-frames when it detects a large change in the sequence, as that should be most efficient, and provide the frames around it a good reference point.
Scene-cut can makes these I-frames IDR frames, depending on the min/max keyframe interval. If it wants to scene-cut (insert I-frame) at a time that is below the min-keyint threshold, then it will just be a normal keyframe, and not an IDR-frame. This will result in a flexible GOP length, unless you use strict values, and at which point the benefit of using is more or less gone.
Short answer, just disable it. The minute possible quality gain, is not worth trying to tweak this.
Are you sure that you want to spend a ton of time on this? The quality impact is most likely quite negligible in the large scope of things. Streaming is, and will for the foreseeable future never be perfect quality.
A lot of people before you have already spent a ton of time digging into this, and testing it, and following their recommendations is going to take a lot less time. Decent chance you will end up like me, and see that all of these changes come at a cost, and in the end didn't really improve quality to any noticeable extent (a few VMAF points perhaps).
Larger keyframe intervals is also something that could be of a slight quality boost at the cost of latency, and is probably more worthwhile to consider. I would consider 3 sec keyframe interval to be the max I would personally use.
Variable GOP length is a headache on most platforms, and I wouldn't recommend it.
Reminder, you can always reach out to me if you spot an error, inconsistency or poorly written/explained concept or topic. Contact form is here