输入pcm样本计数不等于1024时如何使用ffmpeg-API将重新采样的PCM音频编码为AAC [英] How to encode resampled PCM-audio to AAC using ffmpeg-API when input pcm samples count not equal 1024

查看:823
本文介绍了输入pcm样本计数不等于1024时如何使用ffmpeg-API将重新采样的PCM音频编码为AAC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一次捕获音频并将其流式传输到RTMP服务器.我在MacOS(在Xcode中)下工作,因此为了捕获音频样本缓冲区,我使用了AVFoundation-framework.但是对于编码和流式传输,我需要使用ffmpeg-API和libfaac编码器.因此,输出格式必须为AAC(用于支持iOS设备上的流播放).

I am working on capturing and streaming audio to RTMP server at a moment. I work under MacOS (in Xcode), so for capturing audio sample-buffer I use AVFoundation-framework. But for encoding and streaming I need to use ffmpeg-API and libfaac encoder. So output format must be AAC (for supporting stream playback on iOS-devices).

我遇到了这样的问题:音频捕获设备(在我的例子中为Logitech摄像机)为我提供了具有512个LPCM样本的样本缓冲区,并且我可以从16000、24000、36000或48000 Hz中选择输入样本速率.当我将这512个样本提供给AAC编码器(配置为适当的采样率)时,我会听到缓慢而抽搐的音频(似乎就像每帧之后的沉默一样).

And I faced with such problem: audio-capturing device (in my case logitech camera) gives me sample-buffer with 512 LPCM samples, and I can select input sample-rate from 16000, 24000, 36000 or 48000 Hz. When I give these 512 samples to AAC-encoder (configured for appropriate sample-rate), I hear a slow and jerking audio (seems as like pice of silence after each frame).

我弄清楚(也许我错了),libfaac编码器仅接受1024个样本的音频帧.当我在编码之前将输入采样率设置为24000并将输入采样缓冲区重新采样为48000时,可以获得1024个重新采样的采样.将这1024个样本编码为AAC之后,我在输出中听到正确的声音.但是,当输出采样率必须为48000 Hz时,对于任何输入采样率,我的网络摄像机都会在缓冲区中生成512个采样.因此,无论如何我都需要进行重新采样,并且在重新采样后,我不会在缓冲区中获得确切的1024个采样.

I figured out (maybe I am wrong), that libfaac encoder accepts audio frames only with 1024 samples. When I set input samplerate to 24000 and resample input sample-buffer to 48000 before encoding, I obtain 1024 resampled samples. After encoding these 1024 sampels to AAC, I hear proper sound on output. But my web-cam produce 512 samples in buffer for any input samplerate, when output sample-rate must be 48000 Hz. So I need to do resampling in any case, and I will not obtain exactly 1024 samples in buffer after resampling.

在ffmpeg-API功能中是否可以解决此问题?

我将不胜感激.

PS: 我猜想我可以累积重采样的缓冲区,直到采样数变为1024,然后再对其进行编码,但这是流,因此生成的时间戳和其他输入设备会遇到麻烦,因此这种解决方案不适合.

PS: I guess that I can accumulate resampled buffers until count of samples become 1024, and then encode it, but this is stream so there will be troubles with resulting timestamps and with other input devices, and such solution is not suitable.

当前问题来自[问题]中描述的问题:如何使用从CMSampleBufferRef获得的数据填充音频AVFrame(ffmpeg) (AVFoundation)?

The current issue came out of the problem described in [question]: How to fill audio AVFrame (ffmpeg) with the data obtained from CMSampleBufferRef (AVFoundation)?

这是带有音频编解码器配置的代码(也有视频流,但视频工作正常):

Here is a code with audio-codec configs (there also was video stream but video work fine):

    /*global variables*/
    static AVFrame *aframe;
    static AVFrame *frame;
    AVOutputFormat *fmt; 
    AVFormatContext *oc; 
    AVStream *audio_st, *video_st;
Init ()
{
    AVCodec *audio_codec, *video_codec;
    int ret;

    avcodec_register_all();  
    av_register_all();
    avformat_network_init();
    avformat_alloc_output_context2(&oc, NULL, "flv", filename);
    fmt = oc->oformat;
    oc->oformat->video_codec = AV_CODEC_ID_H264;
    oc->oformat->audio_codec = AV_CODEC_ID_AAC;
    video_st = NULL;
    audio_st = NULL;
    if (fmt->video_codec != AV_CODEC_ID_NONE) 
      { //…  /*init video codec*/}
    if (fmt->audio_codec != AV_CODEC_ID_NONE) {
    audio_codec= avcodec_find_encoder(fmt->audio_codec);

    if (!(audio_codec)) {
        fprintf(stderr, "Could not find encoder for '%s'\n",
                avcodec_get_name(fmt->audio_codec));
        exit(1);
    }
    audio_st= avformat_new_stream(oc, audio_codec);
    if (!audio_st) {
        fprintf(stderr, "Could not allocate stream\n");
        exit(1);
    }
    audio_st->id = oc->nb_streams-1;

    //AAC:
    audio_st->codec->sample_fmt  = AV_SAMPLE_FMT_S16;
    audio_st->codec->bit_rate    = 32000;
    audio_st->codec->sample_rate = 48000;
    audio_st->codec->profile=FF_PROFILE_AAC_LOW;
    audio_st->time_base = (AVRational){1, audio_st->codec->sample_rate };
    audio_st->codec->channels    = 1;
    audio_st->codec->channel_layout = AV_CH_LAYOUT_MONO;      


    if (oc->oformat->flags & AVFMT_GLOBALHEADER)
        audio_st->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
    }

    if (video_st)
    {
    //   …
    /*prepare video*/
    }
    if (audio_st)
    {
    aframe = avcodec_alloc_frame();
    if (!aframe) {
        fprintf(stderr, "Could not allocate audio frame\n");
        exit(1);
    }
    AVCodecContext *c;
    int ret;

    c = audio_st->codec;


    ret = avcodec_open2(c, audio_codec, 0);
    if (ret < 0) {
        fprintf(stderr, "Could not open audio codec: %s\n", av_err2str(ret));
        exit(1);
    }

    //…
}

并对音频进行重采样和编码:

And resampling and encoding audio:

if (mType == kCMMediaType_Audio)
{
    CMSampleTimingInfo timing_info;
    CMSampleBufferGetSampleTimingInfo(sampleBuffer, 0, &timing_info);
    double  pts=0;
    double  dts=0;
    AVCodecContext *c;
    AVPacket pkt = { 0 }; // data and size must be 0;
    int got_packet, ret;
     av_init_packet(&pkt);
    c = audio_st->codec;
      CMItemCount numSamples = CMSampleBufferGetNumSamples(sampleBuffer);

    NSUInteger channelIndex = 0;

    CMBlockBufferRef audioBlockBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
    size_t audioBlockBufferOffset = (channelIndex * numSamples * sizeof(SInt16));
    size_t lengthAtOffset = 0;
    size_t totalLength = 0;
    SInt16 *samples = NULL;
    CMBlockBufferGetDataPointer(audioBlockBuffer, audioBlockBufferOffset, &lengthAtOffset, &totalLength, (char **)(&samples));

    const AudioStreamBasicDescription *audioDescription = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(sampleBuffer));

    SwrContext *swr = swr_alloc();

    int in_smprt = (int)audioDescription->mSampleRate;
    av_opt_set_int(swr, "in_channel_layout",  AV_CH_LAYOUT_MONO, 0);

    av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout,  0);

    av_opt_set_int(swr, "in_channel_count", audioDescription->mChannelsPerFrame,  0);
    av_opt_set_int(swr, "out_channel_count", audio_st->codec->channels,  0);

    av_opt_set_int(swr, "out_channel_layout", audio_st->codec->channel_layout,  0);
    av_opt_set_int(swr, "in_sample_rate",     audioDescription->mSampleRate,0);

    av_opt_set_int(swr, "out_sample_rate",    audio_st->codec->sample_rate,0);

    av_opt_set_sample_fmt(swr, "in_sample_fmt",  AV_SAMPLE_FMT_S16, 0);

    av_opt_set_sample_fmt(swr, "out_sample_fmt", audio_st->codec->sample_fmt,  0);

    swr_init(swr);
    uint8_t **input = NULL;
    int src_linesize;
    int in_samples = (int)numSamples;
    ret = av_samples_alloc_array_and_samples(&input, &src_linesize, audioDescription->mChannelsPerFrame,
                                             in_samples, AV_SAMPLE_FMT_S16P, 0);


    *input=(uint8_t*)samples;
    uint8_t *output=NULL;


    int out_samples = av_rescale_rnd(swr_get_delay(swr, in_smprt) +in_samples, (int)audio_st->codec->sample_rate, in_smprt, AV_ROUND_UP);

    av_samples_alloc(&output, NULL, audio_st->codec->channels, out_samples, audio_st->codec->sample_fmt, 0);
    in_samples = (int)numSamples;
    out_samples = swr_convert(swr, &output, out_samples, (const uint8_t **)input, in_samples);


    aframe->nb_samples =(int) out_samples;


    ret = avcodec_fill_audio_frame(aframe, audio_st->codec->channels, audio_st->codec->sample_fmt,
                             (uint8_t *)output,
                             (int) out_samples *
                             av_get_bytes_per_sample(audio_st->codec->sample_fmt) *
                             audio_st->codec->channels, 1);

    aframe->channel_layout = audio_st->codec->channel_layout;
    aframe->channels=audio_st->codec->channels;
    aframe->sample_rate= audio_st->codec->sample_rate;

    if (timing_info.presentationTimeStamp.timescale!=0)
        pts=(double) timing_info.presentationTimeStamp.value/timing_info.presentationTimeStamp.timescale;

    aframe->pts=pts*audio_st->time_base.den;
    aframe->pts = av_rescale_q(aframe->pts, audio_st->time_base, audio_st->codec->time_base);

    ret = avcodec_encode_audio2(c, &pkt, aframe, &got_packet);

    if (ret < 0) {
        fprintf(stderr, "Error encoding audio frame: %s\n", av_err2str(ret));
        exit(1);
    }
    swr_free(&swr);
    if (got_packet)
    {
        pkt.stream_index = audio_st->index;

        pkt.pts = av_rescale_q(pkt.pts, audio_st->codec->time_base, audio_st->time_base);
        pkt.dts = av_rescale_q(pkt.dts, audio_st->codec->time_base, audio_st->time_base);

        // Write the compressed frame to the media file.
       ret = av_interleaved_write_frame(oc, &pkt);
       if (ret != 0) {
            fprintf(stderr, "Error while writing audio frame: %s\n",
                    av_err2str(ret));
            exit(1);
        }

}

推荐答案

我遇到了类似的问题.我将 PCM 数据包编码为 AAC ,而 PCM 数据包的长度有时小于 1024 .

I got a similar problem. I was encoding PCM packets to AAC while the length of PCM packets are sometimes smaller than 1024.

如果我对小于1024的数据包进行编码,则音频将为 slow .另一方面,如果我扔掉它,音频会更快.根据我的观察,swr_convert函数没有任何自动缓冲.

If I encode the packet that's smaller than 1024, the audio will be slow. On the other hand, if I throw it away, the audio will get faster. swr_convert function didn't have any automatic buffering from my observation.

我最终采用了一种缓冲区方案,即将数据包填充到 1024缓冲区,并且每次缓冲区满时都会对其进行编码清理.

I ended up with a buffer scheme that packets was filled to a 1024 buffer and the buffer gets encoded and cleaned everytime it's full.

填充缓冲区的功能如下:

The function to fill buffer is below:

// put frame data into buffer of fixed size
bool ffmpegHelper::putAudioBuffer(const AVFrame *pAvFrameIn, AVFrame **pAvFrameBuffer, AVCodecContext *dec_ctx, int frame_size, int &k0) {
  // prepare pFrameAudio
  if (!(*pAvFrameBuffer)) {
    if (!(*pAvFrameBuffer = av_frame_alloc())) {
      av_log(NULL, AV_LOG_ERROR, "Alloc frame failed\n");
      return false;
    } else {
      (*pAvFrameBuffer)->format = dec_ctx->sample_fmt;
      (*pAvFrameBuffer)->channels = dec_ctx->channels;
      (*pAvFrameBuffer)->sample_rate = dec_ctx->sample_rate;
      (*pAvFrameBuffer)->nb_samples = frame_size;
      int ret = av_frame_get_buffer(*pAvFrameBuffer, 0);
      if (ret < 0) {
        char err[500];
        av_log(NULL, AV_LOG_ERROR, "get audio buffer failed: %s\n",
          av_make_error_string(err, AV_ERROR_MAX_STRING_SIZE, ret));
        return false;
      }
      (*pAvFrameBuffer)->nb_samples = 0;
      (*pAvFrameBuffer)->pts = pAvFrameIn->pts;
    }
  }

  // copy input data to buffer
  int n_channels = pAvFrameIn->channels;
  int new_samples = min(pAvFrameIn->nb_samples - k0, frame_size - (*pAvFrameBuffer)->nb_samples);
  int k1 = (*pAvFrameBuffer)->nb_samples;

  if (pAvFrameIn->format == AV_SAMPLE_FMT_S16) {
    int16_t *d_in = (int16_t *)pAvFrameIn->data[0];
    d_in += n_channels * k0;
    int16_t *d_out = (int16_t *)(*pAvFrameBuffer)->data[0];
    d_out += n_channels * k1;

    for (int i = 0; i < new_samples; ++i) {
      for (int j = 0; j < pAvFrameIn->channels; ++j) {
        *d_out++ = *d_in++;
      }
    }
  } else {
    printf("not handled format for audio buffer\n");
    return false;
  }

  (*pAvFrameBuffer)->nb_samples += new_samples;
  k0 += new_samples;

  return true;
}

填充缓冲区和编码的循环如下:

And the loop for fill buffer and encode is below:

// transcoding needed
int got_frame;
AVMediaType stream_type;
// decode the packet (do it your self)
decodePacket(packet, dec_ctx, &pAvFrame_, got_frame);

if (enc_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
    ret = 0;
    // break audio packet down to buffer
    if (enc_ctx->frame_size > 0) {
        int k = 0;
        while (k < pAvFrame_->nb_samples) {
            if (!putAudioBuffer(pAvFrame_, &pFrameAudio_, dec_ctx, enc_ctx->frame_size, k))
                return false;
            if (pFrameAudio_->nb_samples == enc_ctx->frame_size) {
                // the buffer is full, encode it (do it yourself)
                ret = encodeFrame(pFrameAudio_, stream_index, got_frame, false);
                if (ret < 0)
                    return false;
                pFrameAudio_->pts += enc_ctx->frame_size;
                pFrameAudio_->nb_samples = 0;
            }
        }
    } else {
        ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
    }
} else {
    // encode packet directly
    ret = encodeFrame(pAvFrame_, stream_index, got_frame, false);
}

这篇关于输入pcm样本计数不等于1024时如何使用ffmpeg-API将重新采样的PCM音频编码为AAC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆