AVAudioEngine在macOS/iOS上协调/同步输入/输出时间戳 [英] AVAudioEngine reconcile/sync input/output timestamps on macOS/iOS

查看:170
本文介绍了AVAudioEngine在macOS/iOS上协调/同步输入/输出时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将录制的音频(从 AVAudioEngine inputNode )同步到在录制过程中正在播放的音频文件.结果应类似于多轨录音,其中每个后续新轨道都与录制时正在播放的先前轨道同步.

由于 sampleTime AVAudioEngine 的输出和输入节点之间有所不同,因此我使用 hostTime 来确定原始音频和音频的偏移量.输入缓冲区.

在iOS上,我假设我必须使用 AVAudioSession 的各种延迟属性( inputLatency outputLatency ioBufferDuration )来协调音轨以及主机时间偏移,但我还没有弄清楚使它们正常工作的神奇组合.各种 AVAudioEngine Node 属性(例如 latency presentationLatency)也是如此.

在macOS上, AVAudioSession 不存在(在Catalyst之外),这意味着我无法访问这些数字.同时,在大多数情况下, AVAudioNodes 上的 latency / presentationLatency 属性报告 0.0 .在macOS上,我可以访问 AudioObjectGetPropertyData ,并且可以向系统询问 kAudioDevicePropertyLatency, kAudioDevicePropertyBufferSize kAudioDevicePropertySafetyOffset 等,但是对于调和所有这些的公式又再次感到茫然.

我在

解决方案

此答案仅适用于本机macOS

一般延迟确定

输出

通常情况下,设备上流的输出延迟取决于以下属性的总和:

  1. kAudioDevicePropertySafetyOffset
  2. kAudioStreamPropertyLatency
  3. kAudioDevicePropertyLatency
  4. kAudioDevicePropertyBufferFrameSize

应该为 kAudioObjectPropertyScopeOutput 检索设备安全偏移,流和设备延迟值.

在Mac上,音频设备 MacBook Pro扬声器在44.1 kHz时,这等于71 + 424 + 11 + 512 = 1018帧.

输入

类似地,输入延迟是由以下属性的总和确定的:

  1. kAudioDevicePropertySafetyOffset
  2. kAudioStreamPropertyLatency
  3. kAudioDevicePropertyLatency
  4. kAudioDevicePropertyBufferFrameSize

应该为 kAudioObjectPropertyScopeInput 检索设备安全偏移,流和设备延迟值.

在Mac上,音频设备 MacBook Pro麦克风在44.1 kHz时,这等于114 + 2404 + 40 + 512 = 3070帧.

AVAudioEngine

上面的信息与 AVAudioEngine 的关系尚不清楚.在内部 AVAudioEngine 内部创建一个私有聚合设备,Core Audio本质上自动处理聚合设备的延迟补偿.

在尝试此答案期间,我发现某些(大多数?)音频设备无法正确报告延迟.至少看起来是这样,这使得几乎不可能进行精确的延迟确定.

通过以下调整,我可以使用Mac的内置音频获得相当准确的同步:

 //使AVAudioEngine运行的一些非零值让startDelay = 0.1//原始音频文件的开始时间let originalStartingFrame:AVAudioFramePosition = AVAudioFramePosition(playerNode.outputFormat(forBus:0).sampleRate * startDelay)//缓冲区填充一次后,输出抽头的第一个样本将传递到设备//最初会产生等于缓冲区大小的零样本数量让outputStartingFrame:AVAudioFramePosition = Int64(state.outputBufferSizeFrames)//在考虑所有延迟之后,第一个输出样本将返回输入抽头let inputStartingFrame:AVAudioFramePosition = outputStartingFrame-Int64(state.outputLatency + state.outputStreamLatency + state.outputSafetyOffset + state.inputSafetyOffset + state.inputLatency + state.inputStreamLatency) 

在我的Mac上, AVAudioEngine 聚合设备报告的值是:

 //输出://kAudioDevicePropertySafetyOffset:144//kAudioDevicePropertyLatency:11//kAudioStreamPropertyLatency:424//kAudioDevicePropertyBufferFrameSize:512//输入://kAudioDevicePropertySafetyOffset:154//kAudioDevicePropertyLatency:0//kAudioStreamPropertyLatency:2404//kAudioDevicePropertyBufferFrameSize:512 

等于以下偏移量:

  originalStartingFrame = 4410outputStartingFrame = 512inputStartingFrame = -2625 

I'm attempting to sync recorded audio (from an AVAudioEngine inputNode) to an audio file that was playing during the recording process. The result should be like multitrack recording where each subsequent new track is synced with the previous tracks that were playing at the time of recording.

Because sampleTime differs between the AVAudioEngine's output and input nodes, I use hostTime to determine the offset of the original audio and the input buffers.

On iOS, I would assume that I'd have to use AVAudioSession's various latency properties (inputLatency, outputLatency, ioBufferDuration) to reconcile the tracks as well as the host time offset, but I haven't figured out the magic combination to make them work. The same goes for the various AVAudioEngine and Node properties like latency and presentationLatency.

On macOS, AVAudioSession doesn't exist (outside of Catalyst), meaning I don't have access to those numbers. Meanwhile, the latency/presentationLatency properties on the AVAudioNodes report 0.0 in most circumstances. On macOS, I do have access to AudioObjectGetPropertyData and can ask the system about kAudioDevicePropertyLatency, kAudioDevicePropertyBufferSize,kAudioDevicePropertySafetyOffset, etc, but am again at a bit of a loss as to what the formula is to reconcile all of these.

I have a sample project at https://github.com/jnpdx/AudioEngineLoopbackLatencyTest that runs a simple loopback test (on macOS, iOS, or Mac Catalyst) and shows the result. On my Mac, the offset between tracks is ~720 samples. On others' Macs, I've seen as much as 1500 samples offset.

On my iPhone, I can get it close to sample-perfect by using AVAudioSession's outputLatency + inputLatency. However, the same formula leaves things misaligned on my iPad.

What's the magic formula for syncing the input and output timestamps on each platform? I know it may be different on each, which is fine, and I know I won't get 100% accuracy, but I would like to get as close as possible before going through my own calibration process

Here's a sample of my current code (full sync logic can be found at https://github.com/jnpdx/AudioEngineLoopbackLatencyTest/blob/main/AudioEngineLoopbackLatencyTest/AudioManager.swift):

//Schedule playback of original audio during initial playback
let delay = 0.33 * state.secondsToTicks
let audioTime = AVAudioTime(hostTime: mach_absolute_time() + UInt64(delay))
state.audioBuffersScheduledAtHost = audioTime.hostTime

...

//in the inputNode's inputTap, store the first timestamp
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (pcmBuffer, timestamp) in
            if self.state.inputNodeTapBeganAtHost == 0 {
                self.state.inputNodeTapBeganAtHost = timestamp.hostTime
            }
}

...

//after playback, attempt to reconcile/sync the timestamps recorded above

let timestampToSyncTo = state.audioBuffersScheduledAtHost
let inputNodeHostTimeDiff = Int64(state.inputNodeTapBeganAtHost) - Int64(timestampToSyncTo)
let inputNodeDiffInSamples = Double(inputNodeHostTimeDiff) / state.secondsToTicks * inputFileBuffer.format.sampleRate //secondsToTicks is calculated using mach_timebase_info

//play the original metronome audio at sample position 0 and try to sync everything else up to it
let originalAudioTime = AVAudioTime(sampleTime: 0, atRate: renderingEngine.mainMixerNode.outputFormat(forBus: 0).sampleRate)
originalAudioPlayerNode.scheduleBuffer(metronomeFileBuffer, at: originalAudioTime, options: []) {
  print("Played original audio")
}

//play the tap of the input node at its determined sync time -- this _does not_ appear to line up in the result file
let inputAudioTime = AVAudioTime(sampleTime: AVAudioFramePosition(inputNodeDiffInSamples), atRate: renderingEngine.mainMixerNode.outputFormat(forBus: 0).sampleRate)
recordedInputNodePlayer.scheduleBuffer(inputFileBuffer, at: inputAudioTime, options: []) {
  print("Input buffer played")
}


When running the sample app, here's the result I get:

解决方案

This answer is applicable to native macOS only

General Latency Determination

Output

In the general case the output latency for a stream on a device is determined by the sum of the following properties:

  1. kAudioDevicePropertySafetyOffset
  2. kAudioStreamPropertyLatency
  3. kAudioDevicePropertyLatency
  4. kAudioDevicePropertyBufferFrameSize

The device safety offset, stream, and device latency values should be retrieved for kAudioObjectPropertyScopeOutput.

On my Mac for the audio device MacBook Pro Speakers at 44.1 kHz this equates to 71 + 424 + 11 + 512 = 1018 frames.

Input

Similarly, the input latency is determined by the sum of the following properties:

  1. kAudioDevicePropertySafetyOffset
  2. kAudioStreamPropertyLatency
  3. kAudioDevicePropertyLatency
  4. kAudioDevicePropertyBufferFrameSize

The device safety offset, stream, and device latency values should be retrieved for kAudioObjectPropertyScopeInput.

On my Mac for the audio device MacBook Pro Microphone at 44.1 kHz this equates to 114 + 2404 + 40 + 512 = 3070 frames.

AVAudioEngine

How the information above relates to AVAudioEngine is not immediately clear. Internally AVAudioEngine creates a private aggregate device and Core Audio essentially handles latency compensation for aggregate devices automatically.

During experimentation for this answer I've found that some (most?) audio devices don't report latency correctly. At least that is how it seems, which makes accurate latency determination nigh impossible.

I was able to get fairly accurate synchronization using my Mac's built-in audio using the following adjustments:

// Some non-zero value to get AVAudioEngine running
let startDelay = 0.1

// The original audio file start time
let originalStartingFrame: AVAudioFramePosition = AVAudioFramePosition(playerNode.outputFormat(forBus: 0).sampleRate * startDelay)

// The output tap's first sample is delivered to the device after the buffer is filled once
// A number of zero samples equal to the buffer size is produced initially
let outputStartingFrame: AVAudioFramePosition = Int64(state.outputBufferSizeFrames)

// The first output sample makes it way back into the input tap after accounting for all the latencies
let inputStartingFrame: AVAudioFramePosition = outputStartingFrame - Int64(state.outputLatency + state.outputStreamLatency + state.outputSafetyOffset + state.inputSafetyOffset + state.inputLatency + state.inputStreamLatency)

On my Mac the values reported by the AVAudioEngine aggregate device were:

// Output:
// kAudioDevicePropertySafetyOffset:    144
// kAudioDevicePropertyLatency:          11
// kAudioStreamPropertyLatency:         424
// kAudioDevicePropertyBufferFrameSize: 512

// Input:
// kAudioDevicePropertySafetyOffset:     154
// kAudioDevicePropertyLatency:            0
// kAudioStreamPropertyLatency:         2404
// kAudioDevicePropertyBufferFrameSize:  512

which equated to the following offsets:

originalStartingFrame =  4410
outputStartingFrame   =   512
inputStartingFrame    = -2625

这篇关于AVAudioEngine在macOS/iOS上协调/同步输入/输出时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆