转换音频时C ++ FFmpeg声音失真 [英] C++ FFmpeg distorted sound when converting audio

查看:122
本文介绍了转换音频时C ++ FFmpeg声音失真的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用FFmpeg库来生成包含来自各种文件(例如MP3,WAV,OGG)的音频的MP4文件,但是我遇到了一些麻烦(我也将视频放入其中,但是为了简单起见,我因为我已经解决了这个问题,所以省略了这个问题).我当前的代码打开一个音频文件,对内容进行解码,然后将其转换为MP4容器,最后将其作为交错帧写入目标文件.

I'm using the FFmpeg library to generate MP4 files containing audio from various files, such as MP3, WAV, OGG, but I'm having some troubles (I'm also putting video in there, but for simplicity's sake I'm omitting that for this question, since I've got that working). My current code opens an audio file, decodes the content and converts it into the MP4 container and finally writes it into the destination file as interleaved frames.

对于大多数MP3文件,它都可以完美运行,但是当输入WAV或OGG时,生成的MP4中的音频会稍微失真,并经常以错误的速度播放(快或慢很多倍).

It works perfectly for most MP3 files, but when inputting WAV or OGG, the audio in the resulting MP4 is slightly distorted and often plays at the wrong speed (up to many times faster or slower).

我已经看了无数使用转换函数(swr_convert)的示例,但是我似乎无法消除导出的音频中的噪音.

I've looked at countless of examples of using the converting functions (swr_convert), but I can't seem to get rid of the noise in the exported audio.

这是我向MP4添加音频流的方式(outContext是输出文件的AVFormatContext):

Here's how I add an audio stream to the MP4 (outContext is the AVFormatContext for the output file):

audioCodec = avcodec_find_encoder(outContext->oformat->audio_codec);
if (!audioCodec)
    die("Could not find audio encoder!");


// Start stream
audioStream = avformat_new_stream(outContext, audioCodec);
if (!audioStream)
    die("Could not allocate audio stream!");

audioCodecContext = audioStream->codec;
audioStream->id = 1;


// Setup
audioCodecContext->sample_fmt = AV_SAMPLE_FMT_S16;
audioCodecContext->bit_rate = 128000;
audioCodecContext->sample_rate = 44100;
audioCodecContext->channels = 2;
audioCodecContext->channel_layout = AV_CH_LAYOUT_STEREO;


// Open the codec
if (avcodec_open2(audioCodecContext, audioCodec, NULL) < 0)
    die("Could not open audio codec");

然后打开MP3/WAV/OGG中的声音文件(通过filename变量)...

And to open a sound file from MP3/WAV/OGG (from the filename variable)...

// Create contex
formatContext = avformat_alloc_context();
if (avformat_open_input(&formatContext, filename, NULL, NULL)<0)
    die("Could not open file");


// Find info
if (avformat_find_stream_info(formatContext, 0)<0)
    die("Could not find file info");

av_dump_format(formatContext, 0, filename, false);


// Find audio stream
streamId = av_find_best_stream(formatContext, AVMEDIA_TYPE_AUDIO, -1, -1, NULL, 0);
if (streamId < 0)
    die("Could not find Audio Stream");

codecContext = formatContext->streams[streamId]->codec;


// Find decoder
codec = avcodec_find_decoder(codecContext->codec_id);
if (codec == NULL)
    die("cannot find codec!");


// Open codec
if (avcodec_open2(codecContext, codec, 0)<0)
    die("Codec cannot be found");


// Set up resample context
swrContext = swr_alloc();
if (!swrContext)
    die("Failed to alloc swr context");

av_opt_set_int(swrContext, "in_channel_count", codecContext->channels, 0);
av_opt_set_int(swrContext, "in_channel_layout", codecContext->channel_layout, 0);
av_opt_set_int(swrContext, "in_sample_rate", codecContext->sample_rate, 0);
av_opt_set_sample_fmt(swrContext, "in_sample_fmt", codecContext->sample_fmt, 0);

av_opt_set_int(swrContext, "out_channel_count", audioCodecContext->channels, 0);
av_opt_set_int(swrContext, "out_channel_layout", audioCodecContext->channel_layout, 0);
av_opt_set_int(swrContext, "out_sample_rate", audioCodecContext->sample_rate, 0);
av_opt_set_sample_fmt(swrContext, "out_sample_fmt", audioCodecContext->sample_fmt, 0);

if (swr_init(swrContext))
    die("Failed to init swr context");

最后,进行解码,转换,编码...

Finally, to decode+convert+encode...

// Allocate and init re-usable frames
audioFrameDecoded = av_frame_alloc();
if (!audioFrameDecoded)
        die("Could not allocate audio frame");

audioFrameDecoded->format = fileCodecContext->sample_fmt;
audioFrameDecoded->channel_layout = fileCodecContext->channel_layout;
audioFrameDecoded->channels = fileCodecContext->channels;
audioFrameDecoded->sample_rate = fileCodecContext->sample_rate;

audioFrameConverted = av_frame_alloc();
if (!audioFrameConverted)
        die("Could not allocate audio frame");

audioFrameConverted->nb_samples = audioCodecContext->frame_size;
audioFrameConverted->format = audioCodecContext->sample_fmt;
audioFrameConverted->channel_layout = audioCodecContext->channel_layout;
audioFrameConverted->channels = audioCodecContext->channels;
audioFrameConverted->sample_rate = audioCodecContext->sample_rate;

AVPacket inPacket;
av_init_packet(&inPacket);
inPacket.data = NULL;
inPacket.size = 0;

int frameFinished = 0;

while (av_read_frame(formatContext, &inPacket) >= 0) {

        if (inPacket.stream_index == streamId) {

                int len = avcodec_decode_audio4(fileCodecContext, audioFrameDecoded, &frameFinished, &inPacket);

                if (frameFinished) {

                        // Convert

                        uint8_t *convertedData=NULL;

                        if (av_samples_alloc(&convertedData,
                                             NULL,
                                             audioCodecContext->channels,
                                             audioFrameConverted->nb_samples,
                                             audioCodecContext->sample_fmt, 0) < 0)
                                die("Could not allocate samples");

                        int outSamples = swr_convert(swrContext,
                                                     &convertedData,
                                                     audioFrameConverted->nb_samples,
                                                     (const uint8_t **)audioFrameDecoded->data,
                                                     audioFrameDecoded->nb_samples);
                        if (outSamples < 0)
                                die("Could not convert");

                        size_t buffer_size = av_samples_get_buffer_size(NULL,
                                                                        audioCodecContext->channels,
                                                                        audioFrameConverted->nb_samples,
                                                                        audioCodecContext->sample_fmt,
                                                                        0);
                        if (buffer_size < 0)
                                die("Invalid buffer size");

                        if (avcodec_fill_audio_frame(audioFrameConverted,
                                                     audioCodecContext->channels,
                                                     audioCodecContext->sample_fmt,
                                                     convertedData,
                                                     buffer_size,
                                                     0) < 0)
                                die("Could not fill frame");

                        AVPacket outPacket;
                        av_init_packet(&outPacket);
                        outPacket.data = NULL;
                        outPacket.size = 0;

                        if (avcodec_encode_audio2(audioCodecContext, &outPacket, audioFrameConverted, &frameFinished) < 0)
                                die("Error encoding audio frame");

                        if (frameFinished) {
                                outPacket.stream_index = audioStream->index;

                                if (av_interleaved_write_frame(outContext, &outPacket) != 0)
                                        die("Error while writing audio frame");

                                av_free_packet(&outPacket);
                        }
                }
        }
}

av_frame_free(&audioFrameConverted);
av_frame_free(&audioFrameDecoded);
av_free_packet(&inPacket);

我还尝试为传出的帧设置适当的pts值,但这似乎丝毫不影响音质.

I have also tried setting appropriate pts values for outgoing frames, but that doesn't seem to affect the sound quality at all.

我也不确定如何/如果我应该分配转换后的数据,可以使用av_samples_alloc吗?那avcodec_fill_audio_frame呢?我在正确的轨道上吗?

I'm also unsure how/if I should be allocating the converted data, can av_samples_alloc be used for this? What about avcodec_fill_audio_frame? Am I on the right track?

赞赏任何输入(如果您想听听失真的话,我也可以发送导出的MP4).

Any input is appreciated (I can also send the exported MP4s if necessary, if you want to hear the distortion).

推荐答案

if (avcodec_encode_audio2(audioCodecContext, &outPacket, audioFrameConverted, &frameFinished) < 0)
                die("Error encoding audio frame");

您似乎在假设编码器将吃掉所有提交的样本-事实并非如此.它还不会在内部缓存它们.它会吃掉一定数量的样本(AVCodecContext.frame_size),其余的应该在下次调用avcodec_encode_audio2()时重新提交.

You seem to be assuming that the encoder will eat all submitted samples - it doesn't. It also doesn't cache them internally. It will eat a specific number of samples (AVCodecContext.frame_size), and the rest should be resubmitted in the next call to avcodec_encode_audio2().

好的,所以您编辑的代码更好,但是还没有.您仍然假设解码器将为每次对avcodec_decode_audioN()的调用至少输出frame_size样本(重采样后),事实并非如此.如果发生这种情况(对于ogg,确实如此),您的avcodec_encode_audioN()调用将对不完整的输入缓冲区进行编码(因为您说它具有frame_size样本,但没有).同样,您的代码也不会处理解码器输出的数字明显大于编码器预期的frame_size(例如10 * frame_size)的情况,在这种情况下,您将得到超限-基本上是1:1解码/编码映射是您问题的主要根源.

ok, so your edited code is better, but not there yet. You're still assuming the decoder will output at least frame_size samples for each call to avcodec_decode_audioN() (after resampling), which may not be the case. If that happens (and it does, for ogg), your avcodec_encode_audioN() call will encode an incomplete input buffer (because you say it's got frame_size samples, but it doesn't). Likewise, your code also doesn't deal with cases where the decoder outputs a number significantly bigger than frame_size (like 10*frame_size) expected by the encoder, in which case you'll get overruns - basically your 1:1 decode/encode mapping is the main source of your problem.

作为解决方案,请考虑将swrContext用作FIFO,在其中输入所有解码器样本,并对其进行循环,直到剩下的帧数小于frame_size样本为止.我将留给您学习如何处理流末尾,因为您需要将缓存的样本从解码器中清除(通过使用AVPacket调用avcodec_decode_audioN(),其中.data = NULL和.size = 0),刷新swrContext(通过调用swr_context()直到返回0)以及刷新编码器(通过向其馈入NULL AVFrames直到返回.size = 0的AVPacket).现在,您可能会得到一个输出文件,该文件的末尾略有截断.这应该不难理解.

As a solution, consider the swrContext a FIFO, where you input all decoder samples, and loop over it until it's got less than frame_size samples left. I'll leave it up to you to learn how to deal with end-of-stream, because you'll need to flush cached samples out of the decoder (by calling avcodec_decode_audioN() with AVPacket where .data = NULL and .size = 0), flush the swrContext (by calling swr_context() until it returns 0) as well as flush the encoder (by feeding it NULL AVFrames until it returns AVPacket with .size = 0). Right now you'll probably get an output file where the end is slightly truncated. That shouldn't be hard to figure out.

此代码对我来说适用于m4a/ogg/mp3到m4a/aac的转换:

This code works for me for m4a/ogg/mp3 to m4a/aac conversion:

#include "libswresample/swresample.h"
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libavutil/opt.h"

#include <stdio.h>
#include <stdlib.h>

static void die(char *str) {
    fprintf(stderr, "%s\n", str);
    exit(1);
}

static AVStream *add_audio_stream(AVFormatContext *oc, enum AVCodecID codec_id)
{
    AVCodecContext *c;
    AVCodec *encoder = avcodec_find_encoder(codec_id);
    AVStream *st = avformat_new_stream(oc, encoder);

    if (!st) die("av_new_stream");

    c = st->codec;
    c->codec_id = codec_id;
    c->codec_type = AVMEDIA_TYPE_AUDIO;

    /* put sample parameters */
    c->bit_rate = 64000;
    c->sample_rate = 44100;
    c->channels = 2;
    c->sample_fmt = encoder->sample_fmts[0];
    c->channel_layout = AV_CH_LAYOUT_STEREO;

    // some formats want stream headers to be separate
    if(oc->oformat->flags & AVFMT_GLOBALHEADER)
        c->flags |= CODEC_FLAG_GLOBAL_HEADER;

    return st;
}

static void open_audio(AVFormatContext *oc, AVStream *st)
{
    AVCodecContext *c = st->codec;
    AVCodec *codec;

    /* find the audio encoder */
    codec = avcodec_find_encoder(c->codec_id);
    if (!codec) die("avcodec_find_encoder");

    /* open it */
    AVDictionary *dict = NULL;
    av_dict_set(&dict, "strict", "+experimental", 0);
    int res = avcodec_open2(c, codec, &dict);
    if (res < 0) die("avcodec_open");
}

int main(int argc, char *argv[]) {
    av_register_all();

    if (argc != 3) {
        fprintf(stderr, "%s <in> <out>\n", argv[0]);
        exit(1);
    }

    // Allocate and init re-usable frames
    AVCodecContext *fileCodecContext, *audioCodecContext;
    AVFormatContext *formatContext, *outContext;
    AVStream *audioStream;
    SwrContext *swrContext;
    int streamId;

    // input file
    const char *file = argv[1];
    int res = avformat_open_input(&formatContext, file, NULL, NULL);
    if (res != 0) die("avformat_open_input");
    res = avformat_find_stream_info(formatContext, NULL);
    if (res < 0) die("avformat_find_stream_info");
    AVCodec *codec;
    res = av_find_best_stream(formatContext, AVMEDIA_TYPE_AUDIO, -1, -1, &codec, 0);
    if (res < 0) die("av_find_best_stream");
    streamId = res;
    fileCodecContext = avcodec_alloc_context3(codec);
    avcodec_copy_context(fileCodecContext, formatContext->streams[streamId]->codec);
    res = avcodec_open2(fileCodecContext, codec, NULL);
    if (res < 0) die("avcodec_open2");

    // output file
    const char *outfile = argv[2];
    AVOutputFormat *fmt = fmt = av_guess_format(NULL, outfile, NULL);
    if (!fmt) die("av_guess_format");
    outContext = avformat_alloc_context();
    outContext->oformat = fmt;
    audioStream = add_audio_stream(outContext, fmt->audio_codec);
    open_audio(outContext, audioStream);
    res = avio_open2(&outContext->pb, outfile, AVIO_FLAG_WRITE, NULL, NULL);
    if (res < 0) die("url_fopen");
    avformat_write_header(outContext, NULL);
    audioCodecContext = audioStream->codec;

    // resampling
    swrContext = swr_alloc();
    av_opt_set_channel_layout(swrContext, "in_channel_layout",  fileCodecContext->channel_layout, 0);
    av_opt_set_channel_layout(swrContext, "out_channel_layout", audioCodecContext->channel_layout, 0);
    av_opt_set_int(swrContext, "in_sample_rate", fileCodecContext->sample_rate, 0);
    av_opt_set_int(swrContext, "out_sample_rate", audioCodecContext->sample_rate, 0);
    av_opt_set_sample_fmt(swrContext, "in_sample_fmt", fileCodecContext->sample_fmt, 0);
    av_opt_set_sample_fmt(swrContext, "out_sample_fmt", audioCodecContext->sample_fmt, 0);
    res = swr_init(swrContext);
    if (res < 0) die("swr_init");

    AVFrame *audioFrameDecoded = av_frame_alloc();
    if (!audioFrameDecoded)
        die("Could not allocate audio frame");

    audioFrameDecoded->format = fileCodecContext->sample_fmt;
    audioFrameDecoded->channel_layout = fileCodecContext->channel_layout;
    audioFrameDecoded->channels = fileCodecContext->channels;
    audioFrameDecoded->sample_rate = fileCodecContext->sample_rate;

    AVFrame *audioFrameConverted = av_frame_alloc();
    if (!audioFrameConverted) die("Could not allocate audio frame");

    audioFrameConverted->nb_samples = audioCodecContext->frame_size;
    audioFrameConverted->format = audioCodecContext->sample_fmt;
    audioFrameConverted->channel_layout = audioCodecContext->channel_layout;
    audioFrameConverted->channels = audioCodecContext->channels;
    audioFrameConverted->sample_rate = audioCodecContext->sample_rate;

    AVPacket inPacket;
    av_init_packet(&inPacket);
    inPacket.data = NULL;
    inPacket.size = 0;

    int frameFinished = 0;

    while (av_read_frame(formatContext, &inPacket) >= 0) {
        if (inPacket.stream_index == streamId) {
            int len = avcodec_decode_audio4(fileCodecContext, audioFrameDecoded, &frameFinished, &inPacket);

            if (frameFinished) {

                // Convert

                uint8_t *convertedData=NULL;

                if (av_samples_alloc(&convertedData,
                             NULL,
                             audioCodecContext->channels,
                             audioFrameConverted->nb_samples,
                             audioCodecContext->sample_fmt, 0) < 0)
                    die("Could not allocate samples");

                int outSamples = swr_convert(swrContext, NULL, 0,
                             //&convertedData,
                             //audioFrameConverted->nb_samples,
                             (const uint8_t **)audioFrameDecoded->data,
                             audioFrameDecoded->nb_samples);
                if (outSamples < 0) die("Could not convert");

                for (;;) {
                     outSamples = swr_get_out_samples(swrContext, 0);
                     if (outSamples < audioCodecContext->frame_size * audioCodecContext->channels) break; // see comments, thanks to @dajuric for fixing this

                     outSamples = swr_convert(swrContext,
                                              &convertedData,
                                              audioFrameConverted->nb_samples, NULL, 0);

                     size_t buffer_size = av_samples_get_buffer_size(NULL,
                                    audioCodecContext->channels,
                                    audioFrameConverted->nb_samples,
                                    audioCodecContext->sample_fmt,
                                    0);
                    if (buffer_size < 0) die("Invalid buffer size");

                    if (avcodec_fill_audio_frame(audioFrameConverted,
                             audioCodecContext->channels,
                             audioCodecContext->sample_fmt,
                             convertedData,
                             buffer_size,
                             0) < 0)
                        die("Could not fill frame");

                    AVPacket outPacket;
                    av_init_packet(&outPacket);
                    outPacket.data = NULL;
                    outPacket.size = 0;

                    if (avcodec_encode_audio2(audioCodecContext, &outPacket, audioFrameConverted, &frameFinished) < 0)
                        die("Error encoding audio frame");

                    if (frameFinished) {
                        outPacket.stream_index = audioStream->index;

                        if (av_interleaved_write_frame(outContext, &outPacket) != 0)
                            die("Error while writing audio frame");

                        av_free_packet(&outPacket);
                    }
                }
            }
        }
    }

    swr_close(swrContext);
    swr_free(&swrContext);
    av_frame_free(&audioFrameConverted);
    av_frame_free(&audioFrameDecoded);
    av_free_packet(&inPacket);
    av_write_trailer(outContext);
    avio_close(outContext->pb);
    avcodec_close(fileCodecContext);
    avcodec_free_context(&fileCodecContext);
    avformat_close_input(&formatContext);

    return 0;
}

这篇关于转换音频时C ++ FFmpeg声音失真的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆