Android MediaCodec如何准确地修剪音频帧 [英] Android MediaCodec How to Frame Accurately Trim Audio

查看:518
本文介绍了Android MediaCodec如何准确地修剪音频帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建在Android上精确地修剪视频文件的功能.使用MediaExtractorMediaCodecMediaMuxer实现代码转换.我需要帮助截断任意音频帧以匹配其视频帧.

I am building the capability to frame-accurately trim video files on Android. Transcoding is implemented with MediaExtractor, MediaCodec, and MediaMuxer. I need help truncating arbitrary Audio frames in order to match their Video frame counterparts.

我相信音频帧必须在解码器输出缓冲区中进行修整,这是可以将未压缩的音频数据进行编辑的逻辑位置.

I believe the Audio frames must be trimmed in the Decoder output buffer, which is the logical place in which uncompressed audio data is available for editing.

对于进/出修剪,我正在计算原始音频缓冲区的必要偏移量和大小调整,以将其刺入可用的端盖帧中,并使用以下代码提交数据:

For in/out trims I am calculating the necessary offset and size adjustments to the raw Audio buffer to shoehorn it into the available endcap frames, and I am submitting the data with the following code:

MediaCodec.BufferInfo info = pendingAudioDecoderOutputBufferInfos.poll();
...
ByteBuffer decoderOutputBuffer = audioDecoder.getOutputBuffer(decoderIndex).duplicate();
decoderOutputBuffer.position(info.offset);
decoderOutputBuffer.limit(info.offset + info.size);
encoderInputBuffer.position(0);
encoderInputBuffer.put(decoderOutputBuffer);
info.flags |= MediaCodec.BUFFER_FLAG_END_OF_STREAM;
audioEncoder.queueInputBuffer(encoderIndex, info.offset, info.size, presentationTime, info.flags);
audioDecoder.releaseOutputBuffer(decoderIndex, false);

我的问题是数据调整似乎只影响复制到输出音频缓冲区的数据,而不会缩短写入MediaMuxer的音频帧.输出的视频要么在剪辑结尾处缺少几毫秒的音频而结束,要么如果我写了太多数据,则音频帧会从剪辑结尾处完全掉落.

My problem is that the data adjustments appear to affect only the data copied onto the output audio buffer, but not to shorten the audio frame that gets written into the MediaMuxer. The output video either ends up with several milli-seconds of missing audio at the end of the clip, or if I write too much data the audio frame gets dropped completely from the end of the clip.

如何正确修剪音频框架?

推荐答案

这里有一些作用:

  • 正如Dave指出的那样,您应该将0而不是info.offset传递给audioEncoder.queueInputBuffer-使用decoderOutputBuffer.position(info.offset);设置缓冲区位置时,已经考虑了解码器输出缓冲区的偏移量.但是也许您已经以某种方式对其进行了更新.

  • As Dave pointed out, you should pass 0 instead of info.offset to audioEncoder.queueInputBuffer - you already took the offset of the decoder output buffer into account when you set the buffer position with decoderOutputBuffer.position(info.offset);. But perhaps you update it somehow already.

我不确定MediaCodec音频编码器是否允许您以任意大小的块传递音频数据,或者您是否需要一次完全发送完整的音频帧.我认为它可能会接受-那么您还好.如果没有,那么您需要自己缓冲音频,并在获得全帧后将其传递给编码器(以防在开始时将其修剪掉)

I'm not sure if MediaCodec audio encoders allow you to pass audio data in arbitrary sized chunks, or it you need to send it exactly full audio frames at a time. I think it might accept it though - then you're fine. If not, you need to buffer the audio up yourself and pass it to the encoder once you have a full frame (in case you trimmed out some at the start)

请记住,音频也是基于帧的(对于AAC,除非使用低延迟变体或HE-AAC,否则它是1024个样本帧),因此对于44 kHz,您只能将音频持续时间设为23 ms粒度.如果您希望音频在正确数量的采样后精确结束,则需要使用容器信号进行指示.我不确定MediaCodec音频编码器是否会刷新结尾处的任何半帧,或者如果您未与结尾对齐,则是否需要手动在结尾处传递额外的零以获取最后几个样本.框架大小.不过可能不需要.

Keep in mind that audio also is frame based (for AAC, it's 1024 samples frames unless you use the low delay variants or HE-AAC), so for 44 kHz, you can have audio duration only with a 23 ms granularity. If you want your audio to end precisely after the right amount of samples, you need to use container signaling to indicate this. I'm not sure if the MediaCodec audio encoder flushes whatever half frame you have at the end, or if you manually need to pass it extra zeros at the end in order to get the last few samples, if you aren't aligned to the frame size. It might not be needed though.

对AAC音频进行编码确实会在音频流中引入一些延迟.解码后,您将在解码后的流的开头有很多启动样本(确切的数目取决于编码器-对于AAC-LC的Android中的软件编码器,可能是2048个样本,但也可能是各不相同).对于2048个样本的情况,它恰好与2帧音频对齐,但是也可能不是帧的总数.我也不认为MediaCodec会发出确切的延迟量.如果您从编码器中删除了第一个2个输出数据包(如果延迟为2048个样本),则可以避免额外的延迟,但是前几帧的实际解码音频并不完全正确. (启动数据包对于正确表示流开始的任何采样是必不可少的,否则它将或多或少地收敛到2048个采样中的预期音频.)

Encoding AAC audio does introduce some delay into the audio stream; after decoding, you'll have a number of priming samples at the start of the decoded stream (the exact number of these depends on the encoder - for the software encoder in Android for AAC-LC, it's probably 2048 samples, but it might also vary). For the case of 2048 samples, it exactly lines up with 2 frames of audio, but it can also be something that isn't a whole number of frames. I don't think MediaCodec signals the exact amount of delay either. If you drop the 2 first output packets from the encoder (in case the delay is 2048 samples), you'll avoid the extra delay, but the actual decoded audio for the first few frames won't be exactly right. (The priming packets are necessary to be able to properly represent whatever samples your stream starts with, otherwise it will more or less converge towards your intended audio within 2048 samples.)

这篇关于Android MediaCodec如何准确地修剪音频帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆