为什么切片线程使用ffmpeg x264对实时编码有太大的影响? [英] Why sliced thread affect so much on realtime encoding using ffmpeg x264?

查看:1422
本文介绍了为什么切片线程使用ffmpeg x264对实时编码有太大的影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用ffmpeg libx264编码从x11捕获的720p屏幕,当使用 -tune zerolatency 参数时,实际上以fps为30.
,框架可以与配置文件基线一样大到12ms。

I'm using ffmpeg libx264 to encode a 720p screen captured from x11 in realtime with a fps of 30. when I use -tune zerolatency paramenter, the average encode time per-frame can be as large as 12ms with profile baseline.

在研究ffmpeg x264源代码后,我发现导致这么长编码时间的关键参数是 sliced-threads ,它们由-tune zerolatency启用。在使用-x264-params sliced-threads = 0禁用后,编码时间可以低至2ms

After a study of the ffmpeg x264 source code, I found that the key parameter leading to such long encode time is sliced-threads which enabled by -tune zerolatency. After disabled using -x264-params sliced-threads=0 the encode time can be as low as 2ms

并且在切片线程禁用时,CPU使用率将为40% ,而启用时只有20%。

And with sliced-threads disabled, the CPU usage will be 40%, while only 20% when enabled.

有人可以解释这个切片线程的详细信息吗?特别是在实时编码(假设没有帧被缓冲编码,只有在捕获帧时才进行编码)。

Can someone explain the details about this sliced-thread? Especially in realtime encoding(assume no frame is buffered to be encoded. only encode when a frame is captured).

推荐答案

a href =http://git.videolan.org/?p=x264.git;a=blob_plain;f=doc/threads.txt;h=cea1f6576b8c82f93b787e5f5c400aac9e6b3213;hb=b9461a15b33936a6fd5583da843c132d4fe030f6 =nofollow>文档表明基于帧的线程具有比基于片的更好的吞吐量。它还指出,后者由于编码器的部分串行而不能很好地扩展。

The documentation shows that frame-based threading has better throughput than slice-based. It also notes that the latter doesn't scale well due to parts of the encoder that are serial.

加速与编码线程 veryfast 个人资料(非实时):

Speedup vs. encoding threads for the veryfast profile (non-realtime):

threads  speedup       psnr
      slice frame   slice  frame
x264 --preset veryfast --tune psnr --crf 30
 1:   1.00x 1.00x  +0.000 +0.000
 2:   1.41x 2.29x  -0.005 -0.002
 3:   1.70x 3.65x  -0.035 +0.000
 4:   1.96x 3.97x  -0.029 -0.001
 5:   2.10x 3.98x  -0.047 -0.002
 6:   2.29x 3.97x  -0.060 +0.001
 7:   2.36x 3.98x  -0.057 -0.001
 8:   2.43x 3.98x  -0.067 -0.001
 9:         3.96x         +0.000
10:         3.99x         +0.000
11:         4.00x         +0.001
12:         4.00x         +0.001

主要区别似乎是帧线程添加帧延迟,因为需要不同的帧来工作,而在基于片的线程的所有线程的情况下k在同一帧。在实时编码中,需要等待更多的帧到达,才能填充流水线而不是离线。

The main difference seems to be that frame threading adds frame latency as is needs different frames to work on, while in the case of slice-based threading all threads work on the same frame. In realtime encoding it would need to wait for more frames to arrive to fill the pipeline as opposed to offline.


正常线程,也称为基于帧的线程,使用巧妙的交错帧系统进行并行。但是它花费了一些代价:如前所述,每一个额外的线程需要一帧延迟。基于片段的线程没有这样的问题:每个帧被分割成片,每个片段编码在一个核心上,然后将结果打在一起以制作最终的帧。由于各种原因,它的最大效率要低得多,但它允许至少一些并行性,而不会增加延迟。

Normal threading, also known as frame-based threading, uses a clever staggered-frame system for parallelism. But it comes at a cost: as mentioned earlier, every extra thread requires one more frame of latency. Slice-based threading has no such issue: every frame is split into slices, each slice encoded on one core, and then the result slapped together to make the final frame. Its maximum efficiency is much lower for a variety of reasons, but it allows at least some parallelism without an increase in latency.

从: x264开发人员日记


无切线线程:具有2个线程的示例。
开始编码帧#0。一半完成后,开始编码帧#1。线程#1现在只能访问其参考帧的上半部分,因为其余部分尚未编码。所以它必须限制运动搜索范围。但是这很可能(除非你在一个小框架上使用了很多线程),因为很少有这么长的垂直运动矢量。一段时间后,两个线程都编码了一行宏块,所以线程#1仍然可以使用运动范围= +/- 1/2帧高。稍后,线程#0完成帧#0,并移动到帧#2。线程#0现在获得运动限制,线程#1是无限制的。

Sliceless threading: example with 2 threads. Start encoding frame #0. When it's half done, start encoding frame #1. Thread #1 now only has access to the top half of its reference frame, since the rest hasn't been encoded yet. So it has to restrict the motion search range. But that's probably ok (unless you use lots of threads on a small frame), since it's pretty rare to have such long vertical motion vectors. After a little while, both threads have encoded one row of macroblocks, so thread #1 still gets to use motion range = +/- 1/2 frame height. Later yet, thread #0 finishes frame #0, and moves on to frame #2. Thread #0 now gets motion restrictions, and thread #1 is unrestricted.

From: http://web.archive.org/web/20150307123140/http://akuvian .org / src / x264 / sliceless_threads.txt

因此,启用 sliced-threads -tune zereolatency ,因为您需要尽快发送一个框架,而不是有效地编码(性能和质量明智)。

Therefore it makes sense to enable sliced-threads with -tune zereolatency as you need to send a frame as soon as possible rather then encode them efficiently (performance and quality wise).

使用太多线程可能会影响性能,因为维护它们的开销可能会超过潜在收益。

Using too many threads on the contrary can impact performance as the overhead to maintain them can exceed the potential gains.

这篇关于为什么切片线程使用ffmpeg x264对实时编码有太大的影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆