将视频+生成的音频写入AVAssetWriterInput,音频卡顿 [英] Writing video + generated audio to AVAssetWriterInput, audio stuttering

查看:2286
本文介绍了将视频+生成的音频写入AVAssetWriterInput,音频卡顿的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在iOS上的Unity应用生成视频。我正在使用iVidCap,它使用AVFoundation来做到这一点。那方面都很好。本质上,视频是通过使用纹理渲染目标并将帧传递给Obj-C插件来渲染的。

I'm generating a video from a Unity app on iOS. I'm using iVidCap, which uses AVFoundation to do this. That side is all working fine. Essentially the video is rendered by using a texture render target and passing the frames to an Obj-C plugin.

现在我需要为视频添加音频。音频将是在特定时间发生的声音效果,也可能是某些背景声音。正在使用的文件实际上是Unity应用程序内部的资产。我可能会将这些写入手机存储然后生成AVComposition,但我的计划是避免这种情况,并将音频合成为浮点格式缓冲区(从音频剪辑获取音频是浮动格式)。我可能稍后会做一些动态音频效果。

Now I need to add audio to the video. The audio is going to be sound effects that occur at specific times and maybe some background sound. The files being used are actually assets internal to the Unity app. I could potentially write these to phone storage and then generate an AVComposition, but my plan was to avoid this and composite the audio in floating point format buffers (obtaining audio from audio clips is in float format). I might be doing some on the fly audio effects later on.

几个小时后我设法录制音频并播放视频...但它断断续续。

After several hours I managed to get audio to be recorded and play back with the video... but it stutters.

目前,我只是在每帧视频的持续时间内生成一个方波,并将其写入AVAssetWriterInput。之后,我将生成我真正想要的音频。
如果我生成一个大样本,我不会得到口吃。如果我用块写它(我更喜欢分配一个大型数组),那么音频块似乎互相夹住:

Currently I'm just generating a square wave for the duration of each frame of video and writing it to an AVAssetWriterInput. Later, I'll generate the audio I actually want. If I generate one massive sample, I don't get the stuttering. If I write it in blocks (which I'd much prefer over allocating a massive array), then the blocks of audio seem to clip each other:

我似乎无法弄清楚为什么会这样。我很确定我得到的音频缓冲区的时间戳是正确的,但也许我正在做这整个部分不正确。或者我需要一些标志才能让视频同步到音频?我无法看到这是问题所在,因为我可以在将音频数据提取到wav后在wave编辑器中看到问题。

I can't seem to figure out why this is. I am pretty sure I am getting the timestamp for the audio buffers correct, but maybe I'm doing this whole part incorrectly. Or do I need some flags to get the video to sync to the audio? I cant see that this is the problem, since I can see the problem in a wave editor after extracting the audio data to a wav.

编写音频的相关代码:

        - (id)init
        {
            self = [super init];

        if (self) {

            // [snip]

            rateDenominator = 44100;
            rateMultiplier = rateDenominator / frameRate;       

            sample_position_ = 0;
            audio_fmt_desc_ = nil;
            int nchannels = 2;
            AudioStreamBasicDescription audioFormat;
            bzero(&audioFormat, sizeof(audioFormat));
            audioFormat.mSampleRate = 44100;
            audioFormat.mFormatID   = kAudioFormatLinearPCM;
            audioFormat.mFramesPerPacket = 1;
            audioFormat.mChannelsPerFrame = nchannels;        
            int bytes_per_sample = sizeof(float);
            audioFormat.mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsAlignedHigh;
            audioFormat.mBitsPerChannel = bytes_per_sample * 8;
            audioFormat.mBytesPerPacket = bytes_per_sample * nchannels;            
            audioFormat.mBytesPerFrame = bytes_per_sample * nchannels; 

            CMAudioFormatDescriptionCreate(kCFAllocatorDefault, 
                                           &audioFormat, 
                                           0, 
                                           NULL,
                                           0, 
                                           NULL, 
                                           NULL, 
                                           &audio_fmt_desc_
                                           );
        }

        return self;
    }

-(BOOL) beginRecordingSession {

    NSError* error = nil;

    isAborted = false;
    abortCode = No_Abort;

    // Allocate the video writer object.  
    videoWriter = [[AVAssetWriter alloc] initWithURL:[self getVideoFileURLAndRemoveExisting:
        recordingPath] fileType:AVFileTypeMPEG4 error:&error];

    if (error) {
        NSLog(@"Start recording error: %@", error);
    }

    //Configure video compression settings.
    NSDictionary* videoCompressionProps = [NSDictionary dictionaryWithObjectsAndKeys:
                                           [NSNumber numberWithDouble:1024.0 * 1024.0], AVVideoAverageBitRateKey,
                                           [NSNumber numberWithInt:10],AVVideoMaxKeyFrameIntervalKey,
                                            nil ];

    //Configure video settings.
    NSDictionary* videoSettings = [NSDictionary dictionaryWithObjectsAndKeys:
                                   AVVideoCodecH264, AVVideoCodecKey,
                                   [NSNumber numberWithInt:frameSize.width], AVVideoWidthKey,
                                   [NSNumber numberWithInt:frameSize.height], AVVideoHeightKey,
                                   videoCompressionProps, AVVideoCompressionPropertiesKey,
                                   nil];

    // Create the video writer that is used to append video frames to the output video
    // stream being written by videoWriter.
    videoWriterInput = [[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo outputSettings:videoSettings] retain];
    //NSParameterAssert(videoWriterInput);
    videoWriterInput.expectsMediaDataInRealTime = YES;

    // Configure settings for the pixel buffer adaptor.
    NSDictionary* bufferAttributes = [NSDictionary dictionaryWithObjectsAndKeys:
                                      [NSNumber numberWithInt:kCVPixelFormatType_32ARGB], kCVPixelBufferPixelFormatTypeKey, nil];

    // Create the pixel buffer adaptor, used to convert the incoming video frames and 
    // append them to videoWriterInput.
    avAdaptor = [[AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:videoWriterInput sourcePixelBufferAttributes:bufferAttributes] retain];

    [videoWriter addInput:videoWriterInput];

    // <pb> Added audio input.
    sample_position_ = 0;
    AudioChannelLayout acl;
    bzero( &acl, sizeof(acl));
    acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;


    NSDictionary* audioOutputSettings = nil;          

        audioOutputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
                               [ NSNumber numberWithInt: kAudioFormatMPEG4AAC ], AVFormatIDKey,
                               [ NSNumber numberWithInt: 2 ], AVNumberOfChannelsKey,
                               [ NSNumber numberWithFloat: 44100.0 ], AVSampleRateKey,
                               [ NSNumber numberWithInt: 64000 ], AVEncoderBitRateKey,
                               [ NSData dataWithBytes: &acl length: sizeof( acl ) ], AVChannelLayoutKey,
                               nil];

    audioWriterInput = [[AVAssetWriterInput 
                          assetWriterInputWithMediaType: AVMediaTypeAudio 
                          outputSettings: audioOutputSettings ] retain];

    //audioWriterInput.expectsMediaDataInRealTime = YES;
    audioWriterInput.expectsMediaDataInRealTime = NO; // seems to work slightly better

    [videoWriter addInput:audioWriterInput];

    rateDenominator = 44100;
    rateMultiplier = rateDenominator / frameRate;       

    // Add our video input stream source to the video writer and start it.
    [videoWriter startWriting];
    [videoWriter startSessionAtSourceTime:CMTimeMake(0, rateDenominator)];

    isRecording = true;
    return YES;
}    

        - (int) writeAudioBuffer: (float*) samples sampleCount: (size_t) n channelCount: (size_t) nchans
        {    
            if ( ![self waitForAudioWriterReadiness]) {
                NSLog(@"WARNING: writeAudioBuffer dropped frame after wait limit reached.");
                return 0;
            }

            //NSLog(@"writeAudioBuffer");
            OSStatus status;
            CMBlockBufferRef bbuf = NULL;
            CMSampleBufferRef sbuf = NULL;

            size_t buflen = n * nchans * sizeof(float);
            // Create sample buffer for adding to the audio input.
            status = CMBlockBufferCreateWithMemoryBlock(
                                                        kCFAllocatorDefault, 
                                                        samples, 
                                                        buflen, 
                                                        kCFAllocatorNull, 
                                                        NULL, 
                                                        0, 
                                                        buflen, 
                                                        0, 
                                                        &bbuf);

            if (status != noErr) {
                NSLog(@"CMBlockBufferCreateWithMemoryBlock error");
                return -1;
            }

            CMTime timestamp = CMTimeMake(sample_position_, 44100);
            sample_position_ += n;

            status = CMAudioSampleBufferCreateWithPacketDescriptions(kCFAllocatorDefault, bbuf, TRUE, 0, NULL, audio_fmt_desc_, 1, timestamp, NULL, &sbuf);
            if (status != noErr) {
                NSLog(@"CMSampleBufferCreate error");
                return -1;
            }
            BOOL r = [audioWriterInput appendSampleBuffer:sbuf];
            if (!r) {
                NSLog(@"appendSampleBuffer error");
            }
            CFRelease(bbuf);
            CFRelease(sbuf);

            return 0;
        }

有关正在发生的事情的任何想法?

Any ideas on what's going on?

我应该以不同的方式创建/附加样本吗?

Should I be creating/appending samples in a different way?

是否与AAC压缩有关?如果我尝试使用未压缩的音频(它会抛出),它就不起作用。

Is it something to do with the AAC compression? It doesn't work if I try to use uncompressed audio (it throws).

据我所知,我正在正确计算PTS。为什么音频频道甚至需要它?视频不应该同步到音频时钟吗?

As far as I can tell, I'm calculating the PTS correctly. Why is this even required for the audio channel? Shouldn't the video be synced to the audio clock?

更新:
我尝试以1024个样本的固定块提供音频,因为这是AAC压缩机使用的DCT的大小。没有任何区别。

Update: I've tried providing the audio in fixed blocks of 1024 samples, since this is the size of the DCT used by the AAC compressor. Doesn't make any difference.

在编写任何视频之前,我已经尝试过一次性推送所有块。不起作用。

I've tried pushing all the blocks in one go before writing any video. Doesn't work.

我已尝试将CMSampleBufferCreate用于剩余的块,并仅为第一个块使用CMAudioSampleBufferCreateWithPacketDescriptions。没有变化。

I've tried using CMSampleBufferCreate for the remaining blocks and CMAudioSampleBufferCreateWithPacketDescriptions for the first block only. No change.

我已经尝试过这些的组合。仍然不对。

And I've tried combinations of these. Still not right.

解决方案:

看来:

audioWriterInput.expectsMediaDataInRealTime = YES;

是必不可少的,否则会让人头疼。也许这是因为视频是用这个标志设置的。此外, CMBlockBufferCreateWithMemoryBlock 不会复制样本数据,即使您将标志 kCMBlockBufferAlwaysCopyDataFlag 传递给它。

is essential otherwise it messes with its mind. Perhaps this is because the video was set up with this flag. Additionally, CMBlockBufferCreateWithMemoryBlock does NOT copy sample data, even if you pass the flag kCMBlockBufferAlwaysCopyDataFlag to it.

因此,可以使用此方法创建缓冲区,然后使用 CMBlockBufferCreateContiguous 进行复制,以确保获得带有副本的块缓冲区音频数据。否则它会引用你最初传入的内存,事情就会搞砸了。

So, a buffer can be created with this and then copied using CMBlockBufferCreateContiguous to ensure that it you get a block buffer with a copy of the audio data. Otherwise it will reference the memory you passed in originally and things will get messed up.

推荐答案

看起来不错,虽然我会用 CMBlockBufferCreateWithMemoryBlock 因为它会复制样本。您的代码是否正常,因为不知道audioWriterInput何时完成它们?

It looks ok, although I would use CMBlockBufferCreateWithMemoryBlock because it copies the samples. Is your code ok with not knowing when audioWriterInput has finished with them?

不应该 kAudioFormatFlagIsAlignedHigh kAudioFormatFlagIsPacked

这篇关于将视频+生成的音频写入AVAssetWriterInput,音频卡顿的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆