Android 提取解码编码多路复用音频 [英] Android Extract Decode Encode Mux Audio

查看：41 发布时间：2021/11/27 19:42:07 android audio android-mediacodec multiplexing mediamuxer

本文介绍了Android 提取解码编码多路复用音频的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试修改 ExtractDecodeEditEncodeMuxTest.java 以便从通过 Cordova 的 device.capture.captureVideo 录制的 mp4 中提取音频和视频、解码音频、编辑解码的音频样本、编码音频和复用将音频与视频一起返回并再次保存为 mp4.

I am trying to adapt the code found in ExtractDecodeEditEncodeMuxTest.java in order to extract audio and video from a mp4 recorded via Cordova's device.capture.captureVideo, decode the audio, edit the decoded audio samples, encode the audio, and mux the audio back with the video and save as an mp4 again.

我的第一次尝试只是提取、解码、编码和多路复用音频，而不尝试编辑任何音频样本——如果我能做到这一点，我相当肯定我可以根据需要编辑解码的样本.我不需要编辑视频，所以我假设我可以简单地使用 MediaExtractor 来提取和混合视频轨道.

My first attempt is simply to extract, decode, encode and mux audio without trying to edit any of the audio samples - if I can do this I am fairly certain that I can edit the decoded samples as desired. I don't need to edit the video, so I assume I can simply use MediaExtractor to extract and mux the video track.

但是，我遇到的问题是我似乎无法正确进行音频解码/编码过程.不断发生的事情是多路复用器从提取的视频轨道和提取的 -> 解码 -> 编码的音频轨道创建 mp4，但是当视频播放正常时，音频开始时会发出短促的噪音，然后似乎是最后一个几秒钟的音频数据正常播放(但在视频的开头)，然后在视频的其余部分静音.

However, the problem I am having is that I cannot seem to get the audio decoding/encoding process right. What keeps happening is that the muxer creates the mp4 from the extracted video track and the extracted -> decoded -> encoded audio track, but while the video plays fine, the audio starts with a short burst of noise, then what seems like the last couple seconds of audio data playing normally (but at the beginning of the video), then silence for the rest of the video.

一些相关领域:

private MediaFormat audioFormat;
private MediaFormat videoFormat;
private int videoTrackIndex = -1;
private int audioTrackIndex = -1;
private static final int MAX_BUFFER_SIZE = 256 * 1024;

// parameters for the audio encoder
private static final String OUTPUT_AUDIO_MIME_TYPE = "audio/mp4a-latm"; // Advanced Audio Coding
private static final int OUTPUT_AUDIO_CHANNEL_COUNT = 2; // Must match the input stream. not using this, getting from input format
private static final int OUTPUT_AUDIO_BIT_RATE = 128 * 1024;
private static final int OUTPUT_AUDIO_AAC_PROFILE = MediaCodecInfo.CodecProfileLevel.AACObjectHE; //not using this, getting from input format 
private static final int OUTPUT_AUDIO_SAMPLE_RATE_HZ = 44100; // Must match the input stream
private static final String TAG = "vvsLog";
private static final Boolean DEBUG = false;
private static final Boolean INFO = true;
/** How long to wait for the next buffer to become available. */
private static final int TIMEOUT_USEC = 10000;
private String videoPath;

配置解码器、编码器和复用器的代码:

The code configuring the decoder, encoder and muxer:

MediaCodecInfo audioCodecInfo = selectCodec(OUTPUT_AUDIO_MIME_TYPE);
    if (audioCodecInfo == null) {
        // Don't fail CTS if they don't have an AAC codec (not here, anyway).
        Log.e(TAG, "Unable to find an appropriate codec for " + OUTPUT_AUDIO_MIME_TYPE);
        return;
    }

    MediaExtractor videoExtractor = null;
    MediaExtractor audioExtractor = null;
    MediaCodec audioDecoder = null;
    MediaCodec audioEncoder = null;
    MediaMuxer muxer = null;

    try {

        /**
         * Video
         * just need to configure the extractor, no codec processing required
         */
        videoExtractor = createExtractor(originalAssetPath);
        String vidMimeStartsWith = "video/";
        int videoInputTrack = getAndSelectTrackIndex(videoExtractor, vidMimeStartsWith);
        videoFormat = videoExtractor.getTrackFormat(videoInputTrack);

        /**
         * Audio
         * needs an extractor plus an audio decoder and encoder
         */
        audioExtractor = createExtractor(originalAssetPath);
        String audMimeStartsWith = "audio/";
        int audioInputTrack = getAndSelectTrackIndex(audioExtractor, audMimeStartsWith);
        audioFormat = audioExtractor.getTrackFormat(audioInputTrack);
        audioFormat.setInteger(MediaFormat.KEY_SAMPLE_RATE,OUTPUT_AUDIO_SAMPLE_RATE_HZ);

        MediaFormat outputAudioFormat = MediaFormat.createAudioFormat(OUTPUT_AUDIO_MIME_TYPE,
                audioFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE),
                audioFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT));
        outputAudioFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, audioFormat.getInteger(MediaFormat.KEY_AAC_PROFILE));
        outputAudioFormat.setInteger(MediaFormat.KEY_BIT_RATE, OUTPUT_AUDIO_BIT_RATE);

        // Create a MediaCodec for the decoder, based on the extractor's format, configure and start it.
        audioDecoder = createAudioDecoder(audioFormat);
        // Create a MediaCodec for the desired codec, then configure it as an encoder and start it.
        audioEncoder = createAudioEncoder(audioCodecInfo, outputAudioFormat);

        //create muxer to overwrite original asset path
        muxer = createMuxer(originalAssetPath);

        //add the video and audio tracks
        /**
         * need to wait to add the audio track until after the first encoder output buffer is created
         * since the encoder changes the MediaFormat at that time
         * and the muxer needs the correct format, including the correct Coded Specific Data (CSD) ByteBuffer
         */

        doExtractDecodeEditEncodeMux(
                videoExtractor,
                audioExtractor,
                audioDecoder,
                audioEncoder,
                muxer);

    }

怪物 doExtractDecodeEditEncodeMux 方法:

The monster doExtractDecodeEditEncodeMux method:

private void doExtractDecodeEditEncodeMux(
        MediaExtractor videoExtractor,
        MediaExtractor audioExtractor,
        MediaCodec audioDecoder,
        MediaCodec audioEncoder,
        MediaMuxer muxer) {

    ByteBuffer videoInputBuffer = ByteBuffer.allocate(MAX_BUFFER_SIZE);
    MediaCodec.BufferInfo videoBufferInfo = new MediaCodec.BufferInfo();

    ByteBuffer[] audioDecoderInputBuffers = null;
    ByteBuffer[] audioDecoderOutputBuffers = null;
    ByteBuffer[] audioEncoderInputBuffers = null;
    ByteBuffer[] audioEncoderOutputBuffers = null;
    MediaCodec.BufferInfo audioDecoderOutputBufferInfo = null;
    MediaCodec.BufferInfo audioEncoderOutputBufferInfo = null;

    audioDecoderInputBuffers = audioDecoder.getInputBuffers();
    audioDecoderOutputBuffers =  audioDecoder.getOutputBuffers();
    audioEncoderInputBuffers = audioEncoder.getInputBuffers();
    audioEncoderOutputBuffers = audioEncoder.getOutputBuffers();
    audioDecoderOutputBufferInfo = new MediaCodec.BufferInfo();
    audioEncoderOutputBufferInfo = new MediaCodec.BufferInfo();

    /**
     * sanity checks
     */
    //frames
    int videoExtractedFrameCount = 0;
    int audioExtractedFrameCount = 0;
    int audioDecodedFrameCount = 0;
    int audioEncodedFrameCount = 0;
    //times
    long lastPresentationTimeVideoExtractor = 0;
    long lastPresentationTimeAudioExtractor = 0;
    long lastPresentationTimeAudioDecoder = 0;
    long lastPresentationTimeAudioEncoder = 0;

    // We will get these from the decoders when notified of a format change.
    MediaFormat decoderOutputAudioFormat = null;
    // We will get these from the encoders when notified of a format change.
    MediaFormat encoderOutputAudioFormat = null;
    // We will determine these once we have the output format.
    int outputAudioTrack = -1;
    // Whether things are done on the video side.
    boolean videoExtractorDone = false;
    // Whether things are done on the audio side.
    boolean audioExtractorDone = false;
    boolean audioDecoderDone = false;
    boolean audioEncoderDone = false;
    // The audio decoder output buffer to process, -1 if none.
    int pendingAudioDecoderOutputBufferIndex = -1;

    boolean muxing = false;

    /**
     * need to wait to add the audio track until after the first encoder output buffer is created
     * since the encoder changes the MediaFormat at that time
     * and the muxer needs the correct format, including the correct Coded Specific Data (CSD) ByteBuffer
     * muxer.start();
     * muxing = true;
     */

    MediaMetadataRetriever retrieverTest = new MediaMetadataRetriever();
    retrieverTest.setDataSource(videoPath);
    String degreesStr = retrieverTest.extractMetadata(MediaMetadataRetriever.METADATA_KEY_VIDEO_ROTATION);
    if (degreesStr != null) {
        Integer degrees = Integer.parseInt(degreesStr);
        if (degrees >= 0) {
            muxer.setOrientationHint(degrees);
        }
    }

    while (!videoExtractorDone || !audioEncoderDone) {
        if (INFO) {
            Log.d(TAG, String.format("ex:%d at %d | de:%d at %d | en:%d at %d ",
                    audioExtractedFrameCount, lastPresentationTimeAudioExtractor,
                    audioDecodedFrameCount, lastPresentationTimeAudioDecoder,
                    audioEncodedFrameCount, lastPresentationTimeAudioEncoder
                    ));
        }
        /**
         * Extract and mux video
         */
        while (!videoExtractorDone && muxing) {

            try {
                videoBufferInfo.size = videoExtractor.readSampleData(videoInputBuffer, 0);
            } catch (Exception e) {
                e.printStackTrace();
            }

            if (videoBufferInfo.size < 0) {
                videoBufferInfo.size = 0;
                videoExtractorDone = true;
            } else {
                videoBufferInfo.presentationTimeUs = videoExtractor.getSampleTime();
                lastPresentationTimeVideoExtractor = videoBufferInfo.presentationTimeUs;
                        videoBufferInfo.flags = videoExtractor.getSampleFlags();
                muxer.writeSampleData(videoTrackIndex, videoInputBuffer, videoBufferInfo);
                videoExtractor.advance();
                videoExtractedFrameCount++;
            }
        }

        /**
         * Extract, decode, watermark, encode and mux audio
         */

        /** Extract audio from file and feed to decoder. **/
        while (!audioExtractorDone  && (encoderOutputAudioFormat == null || muxing)) {
            int decoderInputBufferIndex = audioDecoder.dequeueInputBuffer(TIMEOUT_USEC);
            if (decoderInputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned input buffer: " + decoderInputBufferIndex);
            }
            ByteBuffer decoderInputBuffer = audioDecoderInputBuffers[decoderInputBufferIndex];
            int size = audioExtractor.readSampleData(decoderInputBuffer, 0);
            long presentationTime = audioExtractor.getSampleTime();
            lastPresentationTimeAudioExtractor = presentationTime;
            if (DEBUG) {
                Log.d(TAG, "audio extractor: returned buffer of size " + size);
                Log.d(TAG, "audio extractor: returned buffer for time " + presentationTime);
            }
            if (size >= 0) {
                audioDecoder.queueInputBuffer(
                        decoderInputBufferIndex,
                        0,
                        size,
                        presentationTime,
                        audioExtractor.getSampleFlags());
            }
            audioExtractorDone = !audioExtractor.advance();
            if (audioExtractorDone) {
                if (DEBUG) Log.d(TAG, "audio extractor: EOS");
                audioDecoder.queueInputBuffer(
                        decoderInputBufferIndex,
                        0,
                        0,
                        0,
                        MediaCodec.BUFFER_FLAG_END_OF_STREAM);
            }
            audioExtractedFrameCount++;
            // We extracted a frame, let's try something else next.
            break;
        }

        /**
         * Poll output frames from the audio decoder.
         * Do not poll if we already have a pending buffer to feed to the encoder.
         */
        while (!audioDecoderDone && pendingAudioDecoderOutputBufferIndex == -1 && (encoderOutputAudioFormat == null || muxing)) {
            int decoderOutputBufferIndex =
                    audioDecoder.dequeueOutputBuffer(
                            audioDecoderOutputBufferInfo, TIMEOUT_USEC);
            if (decoderOutputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                if (DEBUG) Log.d(TAG, "no audio decoder output buffer");
                break;
            }
            if (decoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED) {
                if (DEBUG) Log.d(TAG, "audio decoder: output buffers changed");
                audioDecoderOutputBuffers = audioDecoder.getOutputBuffers();
                break;
            }
            if (decoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                decoderOutputAudioFormat = audioDecoder.getOutputFormat();
                if (DEBUG) {
                    Log.d(TAG, "audio decoder: output format changed: "
                            + decoderOutputAudioFormat);
                }
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned output buffer: "
                        + decoderOutputBufferIndex);
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned buffer of size "
                        + audioDecoderOutputBufferInfo.size);
            }
            ByteBuffer decoderOutputBuffer =
                    audioDecoderOutputBuffers[decoderOutputBufferIndex];
            if ((audioDecoderOutputBufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG)
                    != 0) {
                if (DEBUG) Log.d(TAG, "audio decoder: codec config buffer");
                audioDecoder.releaseOutputBuffer(decoderOutputBufferIndex, false);
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: returned buffer for time "
                        + audioDecoderOutputBufferInfo.presentationTimeUs);
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: output buffer is now pending: "
                        + pendingAudioDecoderOutputBufferIndex);
            }
            pendingAudioDecoderOutputBufferIndex = decoderOutputBufferIndex;
            audioDecodedFrameCount++;
            // We extracted a pending frame, let's try something else next.
            break;
        }

        // Feed the pending decoded audio buffer to the audio encoder.
        while (pendingAudioDecoderOutputBufferIndex != -1) {
            if (DEBUG) {
                Log.d(TAG, "audio decoder: attempting to process pending buffer: "
                        + pendingAudioDecoderOutputBufferIndex);
            }
            int encoderInputBufferIndex = audioEncoder.dequeueInputBuffer(TIMEOUT_USEC);
            if (encoderInputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                if (DEBUG) Log.d(TAG, "no audio encoder input buffer");
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio encoder: returned input buffer: " + encoderInputBufferIndex);
            }
            ByteBuffer encoderInputBuffer = audioEncoderInputBuffers[encoderInputBufferIndex];
            int size = audioDecoderOutputBufferInfo.size;
            long presentationTime = audioDecoderOutputBufferInfo.presentationTimeUs;
            lastPresentationTimeAudioDecoder = presentationTime;
            if (DEBUG) {
                Log.d(TAG, "audio decoder: processing pending buffer: "
                        + pendingAudioDecoderOutputBufferIndex);
            }
            if (DEBUG) {
                Log.d(TAG, "audio decoder: pending buffer of size " + size);
                Log.d(TAG, "audio decoder: pending buffer for time " + presentationTime);
            }
            if (size >= 0) {
                ByteBuffer decoderOutputBuffer =
                        audioDecoderOutputBuffers[pendingAudioDecoderOutputBufferIndex]
                                .duplicate();
                decoderOutputBuffer.position(audioDecoderOutputBufferInfo.offset);
                decoderOutputBuffer.limit(audioDecoderOutputBufferInfo.offset + size);
                encoderInputBuffer.position(0);
                encoderInputBuffer.put(decoderOutputBuffer);
                audioEncoder.queueInputBuffer(
                        encoderInputBufferIndex,
                        0,
                        size,
                        presentationTime,
                        audioDecoderOutputBufferInfo.flags);
            }
            audioDecoder.releaseOutputBuffer(pendingAudioDecoderOutputBufferIndex, false);
            pendingAudioDecoderOutputBufferIndex = -1;
            if ((audioDecoderOutputBufferInfo.flags
                    & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                if (DEBUG) Log.d(TAG, "audio decoder: EOS");
                audioDecoderDone = true;
            }
            // We enqueued a pending frame, let's try something else next.
            break;
        }

        // Poll frames from the audio encoder and send them to the muxer.
        while (!audioEncoderDone && (encoderOutputAudioFormat == null || muxing)) {
            int encoderOutputBufferIndex = audioEncoder.dequeueOutputBuffer(
                    audioEncoderOutputBufferInfo, TIMEOUT_USEC);
            if (encoderOutputBufferIndex == MediaCodec.INFO_TRY_AGAIN_LATER) {
                if (DEBUG) Log.d(TAG, "no audio encoder output buffer");
                break;
            }
            if (encoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED) {
                if (DEBUG) Log.d(TAG, "audio encoder: output buffers changed");
                audioEncoderOutputBuffers = audioEncoder.getOutputBuffers();
                break;
            }
            if (encoderOutputBufferIndex == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
                encoderOutputAudioFormat = audioEncoder.getOutputFormat();
                if (DEBUG) {
                    Log.d(TAG, "audio encoder: output format changed");
                }
                if (outputAudioTrack >= 0) {
                    Log.e(TAG,"audio encoder changed its output format again?");
                }
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio encoder: returned output buffer: "
                        + encoderOutputBufferIndex);
                Log.d(TAG, "audio encoder: returned buffer of size "
                        + audioEncoderOutputBufferInfo.size);
            }
            ByteBuffer encoderOutputBuffer =
                    audioEncoderOutputBuffers[encoderOutputBufferIndex];
            if ((audioEncoderOutputBufferInfo.flags & MediaCodec.BUFFER_FLAG_CODEC_CONFIG)
                    != 0) {
                if (DEBUG) Log.d(TAG, "audio encoder: codec config buffer");
                // Simply ignore codec config buffers.
                audioEncoder.releaseOutputBuffer(encoderOutputBufferIndex, false);
                break;
            }
            if (DEBUG) {
                Log.d(TAG, "audio encoder: returned buffer for time "
                        + audioEncoderOutputBufferInfo.presentationTimeUs);
            }
            if (audioEncoderOutputBufferInfo.size != 0) {
                lastPresentationTimeAudioEncoder = audioEncoderOutputBufferInfo.presentationTimeUs;
                muxer.writeSampleData(
                        audioTrackIndex, encoderOutputBuffer, audioEncoderOutputBufferInfo);
            }
            if ((audioEncoderOutputBufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM)
                    != 0) {
                if (DEBUG) Log.d(TAG, "audio encoder: EOS");
                audioEncoderDone = true;
            }
            audioEncoder.releaseOutputBuffer(encoderOutputBufferIndex, false);
            audioEncodedFrameCount++;
            // We enqueued an encoded frame, let's try something else next.
            break;
        }

        if (!muxing && (encoderOutputAudioFormat != null)) {

            Log.d(TAG, "muxer: adding video track.");
            videoTrackIndex = muxer.addTrack(videoFormat);

            Log.d(TAG, "muxer: adding audio track.");
            audioTrackIndex = muxer.addTrack(encoderOutputAudioFormat);

            Log.d(TAG, "muxer: starting");
            muxer.start();
            muxing = true;
        }
    }
    /**
     * Done processing audio and video
     */
    Log.d(TAG,"encoded and decoded audio frame counts should match. decoded:"+audioDecodedFrameCount+" encoded:"+audioEncodedFrameCount);

    Log.d(TAG,"decoded frame count should be less than extracted frame coun. decoded:"+audioDecodedFrameCount+" extracted:"+audioExtractedFrameCount);
    Log.d(TAG,"no audio frame should be pending "+pendingAudioDecoderOutputBufferIndex);

    PluginResult result = new PluginResult(PluginResult.Status.OK, videoPath);
    result.setKeepCallback(false);
    callbackContext.sendPluginResult(result);

}

我在提取的前几百个音频帧中看到此 ACodec 错误:

I am seeing this ACodec error for the first several hundred audio frames extracted:

11-25 20:49:58.497   9807-13101/com.vvs.VVS430011 E/ACodec﹕ OMXCodec::onEvent, OMX_ErrorStreamCorrupt
11-25 20:49:58.497   9807-13101/com.vvs.VVS430011 W/AHierarchicalStateMachine﹕ Warning message AMessage(what = 'omx ', target = 8) = {
    int32_t type = 0
    int32_t node = 7115
    int32_t event = 1
    int32_t data1 = -2147479541
    int32_t data2 = 0
    } unhandled in root state.

这是整个 logcat 的粘贴箱，其中包括一些格式为以下的完整性检查日志:

Here's a pastebin of the entire logcat, which includes some sanity check logs in the format of:

D/vvsLog﹕ ex:{extracted frame #} at {presentationTime} | de:{decoded frame #} at {presentationTime} | en:{encoded frame #} at {presentationTime}

当出现那些 OMX_ErrorStreamCorrupt 消息时，编码和解码帧的presentationTime 似乎增加得太快了.当它们停止时，解码和编码帧的presentationTime 似乎恢复到正常"，并且似乎与我在视频开头听到的实际好"音频相匹配——好"音频来自原始音轨的结尾.

The presentationTime of encoded and decoded frames seems to be incrementing too quickly while those OMX_ErrorStreamCorrupt messages are appearing. When they stop, the presentationTime for the decoded and encoded frames seems to return to "normal", and also seems to match up with the actual "good" audio I hear at the beginning of the video - the "good" audio being from the end of the original audio track.

我希望对这些低级 Android 多媒体 API 有比我更多经验的人可以帮助我理解为什么会发生这种情况.请记住，我很清楚这段代码没有经过优化，在单独的线程中运行等等. - 一旦我有了基本提取->解码->编辑->编码的工作示例，我将重构以清理内容->多路复用过程.

I am hoping someone with a lot more experience with these low-level Android multimedia APIs than I have can help me understand why this is happening. Keep in mind I am well aware that this code is not optimized, running in separate threads, etc.. - I will refactor to clean things up once I have a working example of the basic extract->decode->edit->encode->mux process.

谢谢！

Android 提取解码编码多路复用音频 [英] Android Extract Decode Encode Mux Audio

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录关闭

Android 提取解码编码多路复用音频 [英] Android Extract Decode Encode Mux Audio

问题描述

推荐答案

相关文章

移动开发最新文章

热门教程

热门工具

登录 关闭

登录关闭