Libav(ffmpeg)将解码后的视频时间戳复制到编码器 [英] Libav (ffmpeg) copying decoded video timestamps to encoder

查看:62
本文介绍了Libav(ffmpeg)将解码后的视频时间戳复制到编码器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个应用程序,该应用程序从输入文件(任何编解码器,任何容器)解码单个视频流,进行一堆图像处理,并将结果编码为输出文件(单个视频流,Quicktime RLE,MOV)).我正在使用ffmpeg的libav 3.1.5(目前为Windows构建,但该应用程序将是跨平台的.)

I am writing an application that decodes a single video stream from an input file (any codec, any container), does a bunch of image processing, and encodes the results to an output file (single video stream, Quicktime RLE, MOV). I am using ffmpeg's libav 3.1.5 (Windows build for now, but the application will be cross-platform).

输入帧和输出帧之间存在1:1对应,我希望输出中的帧时序与输入相同.我确实很难做到.因此,我的一般问题是:如何可靠地(在所有输入情况下)将输出帧时序设置为与输入相同?

There is a 1:1 correspondence between input and output frames and I want the frame timing in the output to be identical to the input. I am having a really, really hard time accomplishing this. So my general question is: How do I reliably (as in, in all cases of inputs) set the output frame timing identical to the input?

我花了很长时间来浏览API并达到目前的水平.我整理了一个最小的测试程序来使用:

It took me a very long time to slog through the API and get to the point I am at now. I put together a minimal test program to work with:

#include <cstdio>

extern "C" {
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/avutil.h>
#include <libavutil/imgutils.h>
#include <libswscale/swscale.h>
}

using namespace std;


struct DecoderStuff {
    AVFormatContext *formatx;
    int nstream;
    AVCodec *codec;
    AVStream *stream;
    AVCodecContext *codecx;
    AVFrame *rawframe;
    AVFrame *rgbframe;
    SwsContext *swsx;
};


struct EncoderStuff {
    AVFormatContext *formatx;
    AVCodec *codec;
    AVStream *stream;
    AVCodecContext *codecx;
};


template <typename T>
static void dump_timebase (const char *what, const T *o) {
    if (o)
        printf("%s timebase: %d/%d\n", what, o->time_base.num, o->time_base.den);
    else
        printf("%s timebase: null object\n", what);
}


// reads next frame into d.rawframe and d.rgbframe. returns false on error/eof.
static bool read_frame (DecoderStuff &d) {

    AVPacket packet;
    int err = 0, haveframe = 0;

    // read
    while (!haveframe && err >= 0 && ((err = av_read_frame(d.formatx, &packet)) >= 0)) {
       if (packet.stream_index == d.nstream) {
           err = avcodec_decode_video2(d.codecx, d.rawframe, &haveframe, &packet);
       }
       av_packet_unref(&packet);
    }

    // error output
    if (!haveframe && err != AVERROR_EOF) {
        char buf[500];
        av_strerror(err, buf, sizeof(buf) - 1);
        buf[499] = 0;
        printf("read_frame: %s\n", buf);
    }

    // convert to rgb
    if (haveframe) {
        sws_scale(d.swsx, d.rawframe->data, d.rawframe->linesize, 0, d.rawframe->height,
                  d.rgbframe->data, d.rgbframe->linesize);
    }

    return haveframe;

}


// writes an output frame, returns false on error.
static bool write_frame (EncoderStuff &e, AVFrame *inframe) {

    // see note in so post about outframe here
    AVFrame *outframe = av_frame_alloc();
    outframe->format = inframe->format;
    outframe->width = inframe->width;
    outframe->height = inframe->height;
    av_image_alloc(outframe->data, outframe->linesize, outframe->width, outframe->height,
                   AV_PIX_FMT_RGB24, 1);
    //av_frame_copy(outframe, inframe);
    static int count = 0;
    for (int n = 0; n < outframe->width * outframe->height; ++ n) {
        outframe->data[0][n*3+0] = ((n+count) % 100) ? 0 : 255;
        outframe->data[0][n*3+1] = ((n+count) % 100) ? 0 : 255;
        outframe->data[0][n*3+2] = ((n+count) % 100) ? 0 : 255;
    }
    ++ count;

    AVPacket packet;
    av_init_packet(&packet);
    packet.size = 0;
    packet.data = NULL;

    int err, havepacket = 0;
    if ((err = avcodec_encode_video2(e.codecx, &packet, outframe, &havepacket)) >= 0 && havepacket) {
        packet.stream_index = e.stream->index;
        err = av_interleaved_write_frame(e.formatx, &packet);
    }

    if (err < 0) {
        char buf[500];
        av_strerror(err, buf, sizeof(buf) - 1);
        buf[499] = 0;
        printf("write_frame: %s\n", buf);
    }

    av_packet_unref(&packet);
    av_freep(&outframe->data[0]);
    av_frame_free(&outframe);

    return err >= 0;

}


int main (int argc, char *argv[]) {

    const char *infile = "wildlife.wmv";
    const char *outfile = "test.mov";
    DecoderStuff d = {};
    EncoderStuff e = {};

    av_register_all();

    // decoder
    avformat_open_input(&d.formatx, infile, NULL, NULL);
    avformat_find_stream_info(d.formatx, NULL);
    d.nstream = av_find_best_stream(d.formatx, AVMEDIA_TYPE_VIDEO, -1, -1, &d.codec, 0);
    d.stream = d.formatx->streams[d.nstream];
    d.codecx = avcodec_alloc_context3(d.codec);
    avcodec_parameters_to_context(d.codecx, d.stream->codecpar);
    avcodec_open2(d.codecx, NULL, NULL);
    d.rawframe = av_frame_alloc();
    d.rgbframe = av_frame_alloc();
    d.rgbframe->format = AV_PIX_FMT_RGB24;
    d.rgbframe->width = d.codecx->width;
    d.rgbframe->height = d.codecx->height;
    av_frame_get_buffer(d.rgbframe, 1);
    d.swsx = sws_getContext(d.codecx->width, d.codecx->height, d.codecx->pix_fmt,
                            d.codecx->width, d.codecx->height, AV_PIX_FMT_RGB24,
                            SWS_POINT, NULL, NULL, NULL);
    //av_dump_format(d.formatx, 0, infile, 0);
    dump_timebase("in stream", d.stream);
    dump_timebase("in stream:codec", d.stream->codec); // note: deprecated
    dump_timebase("in codec", d.codecx);

    // encoder
    avformat_alloc_output_context2(&e.formatx, NULL, NULL, outfile);
    e.codec = avcodec_find_encoder(AV_CODEC_ID_QTRLE);
    e.stream = avformat_new_stream(e.formatx, e.codec);
    e.codecx = avcodec_alloc_context3(e.codec);
    e.codecx->bit_rate = 4000000; // arbitrary for qtrle
    e.codecx->width = d.codecx->width;
    e.codecx->height = d.codecx->height;
    e.codecx->gop_size = 30; // 99% sure this is arbitrary for qtrle
    e.codecx->pix_fmt = AV_PIX_FMT_RGB24;
    e.codecx->time_base = d.stream->time_base; // ???
    e.codecx->flags |= (e.formatx->flags & AVFMT_GLOBALHEADER) ? AV_CODEC_FLAG_GLOBAL_HEADER : 0;
    avcodec_open2(e.codecx, NULL, NULL);
    avcodec_parameters_from_context(e.stream->codecpar, e.codecx); 
    //av_dump_format(e.formatx, 0, outfile, 1);
    dump_timebase("out stream", e.stream);
    dump_timebase("out stream:codec", e.stream->codec); // note: deprecated
    dump_timebase("out codec", e.codecx);

    // open file and write header
    avio_open(&e.formatx->pb, outfile, AVIO_FLAG_WRITE); 
    avformat_write_header(e.formatx, NULL);

    // frames
    while (read_frame(d) && write_frame(e, d.rgbframe))
        ;

    // write trailer and close file
    av_write_trailer(e.formatx);
    avio_closep(&e.formatx->pb); 

}

关于此的一些注释:

  • 由于到目前为止我在帧定时方面的所有尝试均以失败告终,因此我从这段代码中删除了几乎所有与定时相关的内容,从而使工作从头开始.
  • 为简洁起见,几乎忽略了所有错误检查和清理.
  • 我在 write_frame 中分配带有新缓冲区的新输出帧,而不是直接使用 inframe 的原因是,因为这更能代表我的实际应用程序正在做.我的真实应用还内部使用了RGB24,因此在这里进行了转换.
  • 我在 outframe 中生成奇怪模式的原因,而不是使用例如 av_copy_frame ,是因为我只想使用Quicktime RLE很好地压缩测试模式(否则我的测试输入最终会生成1.7GB输出文件).
  • 可以找到我正在使用的输入视频"wildlife.wmv" avcodec_encode_video2 刷新一些缓冲区或其他内容,但对此我有些困惑.除非有人想解释一下,让我们暂时忽略它,否则这是一个单独的问题.文档对这一点的含糊之处与对其他所有内容的含糊之处一样.
  • 我的测试输入文件的帧频是29.97.
  • Since all of my attempts at frame timing so far have failed, I've removed almost all timing-related stuff from this code to start with a clean slate.
  • Almost all error checking and cleanup omitted for brevity.
  • The reason I allocate a new output frame with a new buffer in write_frame, rather than using inframe directly, is because this is more representative of what my real application is doing. My real app also uses RGB24 internally, hence the conversions here.
  • The reason I generate a weird pattern in outframe, rather than using e.g. av_copy_frame, is because I just wanted a test pattern that compressed well with Quicktime RLE (my test input ends up generating a 1.7GB output file otherwise).
  • The input video I am using, "wildlife.wmv", can be found here. I've hard-coded the filenames.
  • I am aware that avcodec_decode_video2 and avcodec_encode_video2 are deprecated, but don't care. They work fine, I've already struggled too much getting my head around the latest version of the API, ffmpeg changes their API with nearly every release, and I really don't feel like dealing with avcodec_send_* and avcodec_receive_* right now.
  • I think I'm supposed to be finishing off by passing a NULL frame to avcodec_encode_video2 to flush some buffers or something but I'm a bit confused about that. Unless somebody feels like explaining that let's ignore it for now, it's a separate question. The docs are as vague about this point as they are about everything else.
  • My test input file's frame rate is 29.97.

现在,就我目前的尝试而言.上面的代码中存在以下与时序相关的字段,其中细节/混淆用粗体显示.它们很多,因为API令人费解:

Now, as for my current attempts. The following timing related fields are present in the above code, with details/confusion in bold. There's a lot of them, because the API is mind-bogglingly convoluted:

  • main:d.stream-> time_base :输入视频流的时基.对于我的测试输入文件,为1/1000.
  • main:d.stream-> codec-> time_base :不知道这是什么(我永远无法理解为什么 AVStream 具有当您始终始终使用自己的新上下文时,将不使用AVCodecContext 字段),并且不建议使用 codec 字段.对于我的测试输入文件,为1/1000.
  • main:d.codecx-> time_base :输入编解码器上下文时基.对于我的测试输入文件,该值为0/1.我应该设置它吗?
  • main:e.stream-> time_base :我创建的输出流的时基.该设置为什么?
  • main:e.stream-> codec-> time_base :我创建的输出流的已弃用且神秘的编解码器字段的时基.我可以将其设置为任何内容吗?
  • main:e.codecx-> time_base :我创建的编码器上下文的时基.该设置为什么?
  • read_frame:packet.dts :解码数据包读取的时间戳.
  • read_frame:packet.pts :数据包读取的呈现时间戳.
  • read_frame:packet.duration :数据包读取的持续时间.
  • read_frame:d.rawframe-> pts :已解码原始帧的显示时间戳.始终为0.为什么解码器不读取它??
  • read_frame:d.rgbframe-> pts / write_frame:inframe-> pts :已转换为RGB的解码帧的表示时间戳.当前未设置任何内容.
  • read_frame:d.rawframe-> pkt _ * :从数据包复制的字段,在阅读此内容后发现发布.它们设置正确,但我不知道它们是否有用.
  • write_frame:outframe-> pts :正在编码的帧的显示时间戳.我应该将此设置为某些内容吗?
  • write_frame:outframe-> pkt _ * :数据包中的计时字段.我应该设置这些吗?它们似乎被编码器忽略了.
  • write_frame:packet.dts :正在编码的数据包的解码时间戳.该设置为什么?
  • write_frame:packet.pts :正在编码的数据包的表示时间戳.该设置为什么?
  • write_frame:packet.duration :正在编码的数据包的持续时间.该设置为什么?
  • main: d.stream->time_base: Input video stream time base. For my test input file this is 1/1000.
  • main: d.stream->codec->time_base: Not sure what this is (I never could make sense of why AVStream has an AVCodecContext field when you always use your own new context anyways) and also the codec field is deprecated. For my test input file this is 1/1000.
  • main: d.codecx->time_base: Input codec context time-base. For my test input file this is 0/1. Am I supposed to set it?
  • main: e.stream->time_base: Time base of the output stream I create. What do I set this to?
  • main: e.stream->codec->time_base: Time base of the deprecated and mysterious codec field of the output stream I create. Do I set this to anything?
  • main: e.codecx->time_base: Time base of the encoder context I create. What do I set this to?
  • read_frame: packet.dts: Decoding timestamp of packet read.
  • read_frame: packet.pts: Presentation timestamp of packet read.
  • read_frame: packet.duration: Duration of packet read.
  • read_frame: d.rawframe->pts: Presentation timestamp of raw frame decoded. This is always 0. Why isn't it read by the decoder...?
  • read_frame: d.rgbframe->pts / write_frame: inframe->pts: Presentation timestamp of decoded frame converted to RGB. Not set to anything currently.
  • read_frame: d.rawframe->pkt_*: Fields copied from packet, discovered after reading this post. They are set correctly but I don't know if they are useful.
  • write_frame: outframe->pts: Presentation timestamp of frame being encoded. Should I set this to something?
  • write_frame: outframe->pkt_*: Timing fields from a packet. Should I set these? They seem to be ignored by the encoder.
  • write_frame: packet.dts: Decoding timestamp of packet being encoded. What do I set it to?
  • write_frame: packet.pts: Presentation timestamp of packet being encoded. What do I set it to?
  • write_frame: packet.duration: Duration of packet being encoded. What do I set it to?

我已经尝试了以下方法,并描述了结果.请注意, inframe d.rgbframe :

I have tried the following, with the described results. Note that inframe is d.rgbframe:

  1.  
    • 初始化 e.stream-> time_base = d.stream-> time_base
    • 初始化 e.codecx-> time_base = d.codecx-> time_base
    • read_frame
    • 中设置 d.rgbframe-> pts = packet.dts
    • write_frame
    • 中设置 outframe-> pts = inframe-> pts
    • 结果:警告未设置编码器时基(由于 d.codecx-> time_base为0/1 ),段错误.
  1.  
    • Init e.stream->time_base = d.stream->time_base
    • Init e.codecx->time_base = d.codecx->time_base
    • Set d.rgbframe->pts = packet.dts in read_frame
    • Set outframe->pts = inframe->pts in write_frame
    • Result: Warning that encoder time base is not set (since d.codecx->time_base was 0/1), seg fault.
  • 初始化 e.stream-> time_base = d.stream-> time_base
  • 初始化 e.codecx-> time_base = d.stream-> time_base
  • read_frame
  • 中设置 d.rgbframe-> pts = packet.dts
  • write_frame
  • 中设置 outframe-> pts = inframe-> pts
  • 结果:没有警告,但是VLC报告帧率为480.048(不知道该数字来自何处)并且文件播放太快.此外,编码器还将 packet 中的所有计时字段都设置为0,这不是我期望的.(原来是因为 av_interleaved_write_frame av_write_frame 不同,它获取数据包的所有权并将其交换为空白,然后我在调用后打印 值,因此它们不会被忽略.)
  • Init e.stream->time_base = d.stream->time_base
  • Init e.codecx->time_base = d.stream->time_base
  • Set d.rgbframe->pts = packet.dts in read_frame
  • Set outframe->pts = inframe->pts in write_frame
  • Result: No warnings, but VLC reports frame rate as 480.048 (no idea where this number came from) and file plays too fast. Also the encoder sets all the timing fields in packet to 0, which was not what I expected. ( Turns out this is because av_interleaved_write_frame, unlike av_write_frame, takes ownership of the packet and swaps it with a blank one, and I was printing the values after that call. So they are not ignored.)
  • 初始化 e.stream-> time_base = d.stream-> time_base
  • 初始化 e.codecx-> time_base = d.stream-> time_base
  • read_frame
  • 中设置 d.rgbframe-> pts = packet.dts
  • write_frame 数据包中的pts/dts/duration设置为任意值.
  • 结果:未设置有关数据包时间戳的警告.编码器似乎将所有数据包计时字段都重置为0,所以这些都没有任何作用.
  • Init e.stream->time_base = d.stream->time_base
  • Init e.codecx->time_base = d.stream->time_base
  • Set d.rgbframe->pts = packet.dts in read_frame
  • Set any of pts/dts/duration in packet in write_frame to anything.
  • Result: Warnings about packet timestamps not set. Encoder seems to reset all packet timing fields to 0, so none of this has any effect.
  • Init e.stream->time_base = d.stream->time_base
  • Init e.codecx->time_base = d.stream->time_base
  • I found these fields, pkt_pts, pkt_dts, and pkt_duration in AVFrame after reading this post, so I tried copying those all the way through to outframe.
  • Result: Really had my hopes up, but ended up with same results as attempt 3 (packet timestamp not set warning, incorrect results).

我尝试了上述方法的其他各种手动排列,但没有任何效果.我想做的是创建一个输出文件,该文件以与输入相同的时序和帧速率(在这种情况下为29.97恒定帧速率)进行播放.

I tried various other hand-wavey permutations of the above and nothing worked. What I want to do is create an output file that plays back with the same timing and frame rate as the input (29.97 constant frame rate in this case).

那我该怎么做?在这里成千上万个与计时相关的字段中,我该怎么做才能使输出与输入相同?以及如何处理可以将其时间戳和时基存储在不同位置的任意视频输入格式的方式呢?我需要它来始终工作.

So how do I do this? Of the zillions of timing related fields here, what do I do to make the output be the same as the input? And how do I do it in such a way that handles arbitrary video input formats that may store their time stamps and time bases in different places? I need this to always work.

作为参考,这是从我的测试输入文件的视频流中读取的所有数据包和帧时间戳的表,以使您大致了解我的测试文件的外观.与帧pt相同,未设置任何输入数据包pt',并且由于某些原因,前108帧的持续时间为0.VLC很好地播放了文件,并将帧速率报告为29.9700089:

For reference, here is a table of all the packet and frame timestamps read from the video stream of my test input file, to give a sense of what my test file looks like. None of the input packet pts' are set, same with frame pts, and for some reason the duration of the first 108 frames is 0. VLC plays the file fine and reports the frame rate as 29.9700089:

  • Table is here since it was too large for this post.

推荐答案

我认为您在这里遇到的问题是时基,起初有些令人困惑.

I think your issue here is with time bases which are at first a bit confusing.

  • d.stream-> time_base:输入视频流时基.这是输入容器中时间戳的分辨率.从 av_read_frame 返回的编码帧将具有此分辨率的时间戳.
  • d.stream-> codec-> time_base:不确定这是什么.这里是API兼容性的旧API.您正在使用编解码器参数,因此请忽略它.
  • d.codecx-> time_base:输入编解码器上下文时基.对于我的测试输入文件,这是0/1.我应该设置它吗?这是编解码器(而不是容器)的时间戳分辨率.编解码器将假定其输入编码帧具有此分辨率的时间戳,并且还将在此分辨率下设置输出解码帧的时间戳.
  • e.stream-> time_base:我创建的输出流的时基.与解码器相同
  • e.stream->编解码器-> time_base .与解复用器相同-忽略这一点.
  • e.codecx-> time_base -与解复用器相同
  • d.stream->time_base: Input video stream time base. This is a resolution of timestamps in the input container. Encoded frame returned from av_read_frame will have its timestamps in this resolution.
  • d.stream->codec->time_base: Not sure what this is. It is old API left here for API compatibility; you are using codec parameters so ignore it.
  • d.codecx->time_base: Input codec context time-base. For my test input file this is 0/1. Am I supposed to set it? This is a resolution of timestamps for the codec (as opposed to container). Codec will assume its input encoded frame have its timestamps in this resolution, and also it will set timestamps in output decoded frame in this resolution.
  • e.stream->time_base: Time base of the output stream I create. Same as with decoder
  • e.stream->codec->time_base. Same as with demuxer - ignore this one.
  • e.codecx->time_base - same as with demuxer

因此,您需要执行以下操作:

So you need to do following:

  • 打开多路分配器.该部分有效
  • 将解码器时基设置为某个合理"的值,因为解码器可能不会这样做,并且 0/1不好.如果未设置任何组件的任何时基,则事情将无法正常进行.最简单的就是从解复用器中复制时基
  • 打开解码器.它可能会更改其时基,也可能不会更改.
  • 设置编码器时基.最简单的方法是从(现在已打开的)解码器中复制时基,因为您无需更改帧率或其他任何内容.
  • 打开编码器.可能会更改其时基
  • 设置多路复用器时基.同样,最简单的方法是从编码器复制时基
  • 打开多路复用器.它也可能会更改其时基.
  • open demuxer. That part works
  • set decoder timebase to some "sane" value because decoder might not do that, and 0/1 is bad. Things won't work as they should if any of timebases for any of components are not set. Easiest is to just copy time base from demuxer
  • open decoder. It might change its timebase, or it might not.
  • set encoder timebase. Easiest is to copy timebase from (now opened) decoder since you are not changing framerates or anything.
  • open encoder. It might change its timebase
  • set muxer timebase. Again, easiest is to copy timebase from encoder
  • open muxer. It might change its timebase as well.

现在每帧:

  • 从解复用器中读取它
  • 将时间戳从解复用器转换为解码器时基.有 av_packet_rescale_ts 可以帮助您做到这一点
  • 解码包
  • 将帧时间戳( pts )设置为 av_frame_get_best_effort_timestamp
  • 返回的值
  • 将帧时间戳从解码器转换为编码器时基.使用 av_rescale_q av_rescale_q_rnd
  • 编码数据包
  • 将时间戳从编码器转换为多路复用器时基.再次使用 av_packet_rescale_ts
  • read it from the demuxer
  • convert timestamps from demuxer to decoder timebases. There is av_packet_rescale_ts to help you do that
  • decode packet
  • set frame timestamp (pts) to a value returned by av_frame_get_best_effort_timestamp
  • convert frame timestamp from decoder to encoder timebases. Use av_rescale_q or av_rescale_q_rnd
  • encode packet
  • convert timestamps from encoder to muxer timebases. Again, use av_packet_rescale_ts

这可能是一个矫kill过正,尤其是编码器可能不会在打开时更改其时基(在这种情况下,您无需转换原始帧的 pts ).

This might be an overkill, in particular maybe encoders doesn't change their timebase on open (in which case you don't need to convert raw frames' pts).

关于冲洗-传递给编码器的帧不一定要立即编码并输出,因此,是的,您应该调用NULL作为帧的 avcodec_encode_video2 ,以使编码器知道您已经完成并制作它输出所有剩余的数据(与所有其他数据包一样,您需要通过多路复用器传递该数据).实际上,您应该重复这样做,直到它停止喷出数据包为止.有关某些示例,请参见ffmpeg内 doc/examples 文件夹中的编码示例之一.

Regarding flushing - frames you pass to encoder are not necessarily encoded and output right away, so yes you are supposed to call avcodec_encode_video2 with NULL as a frame to let the encoder know you are done and make it output all the remaining data (which you need to pass through muxer as with all the other packets). In fact, you are supposed to do so repeatedly until it stops spewing out packets. See one of encoding examples in doc/examples folder inside ffmpeg for some samples.

这篇关于Libav(ffmpeg)将解码后的视频时间戳复制到编码器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆