使用librosa隔离音频前景并转换回音频流 [英] Isolating audio foreground and converting back to audio stream using librosa

查看:285
本文介绍了使用librosa隔离音频前景并转换回音频流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图隔离音频流的前景,然后使用librosa将其另存为独立音频流.

I'm trying to isolate the foreground of an audio stream and then save it as a standalone audio stream using librosa.

从此看似我已经隔离了完整的,前景和背景数据,如示例在S_fullS_foregroundS_background中所做的那样,但是我不确定如何使用这些数据作为音频.

I have the full, foreground and background data isolated as the example does in S_full, S_foreground and S_background but I'm unsure as to what to do to use those as audio.

我尝试使用 librosa.istft(...)进行转换,然后使用soundfile.write(...)将其另存为.wav文件,但剩下的文件大小大致正确,但数据不可用(?).

I attempted to use librosa.istft(...) to convert those and then save that as a .wav file using soundfile.write(...) but I'm left with a file of roughly the right size but unusable(?) data.

任何人都可以形容或指出我的例子吗?

Can anyone describe or point me at an example?

谢谢.

推荐答案

将最小的示例放在一起, 具有原始采样率的istft()实际上可以工作.

in putting together the minimal example, istft() with the original sampling rate does in fact work.

我会在某个地方找到我的错误. FWIW这是工作代码

I'll find my bug, somewhere. FWIW here's the working code

import numpy as np
import librosa
from librosa import display
import soundfile
import matplotlib.pyplot as plt

y, sr = librosa.load('audio/rb-testspeech.mp3', duration=5)
S_full, phase = librosa.magphase(librosa.stft(y))

S_filter = librosa.decompose.nn_filter(S_full,
                                       aggregate=np.median,
                                       metric='cosine',
                                       width=int(librosa.time_to_frames(2, sr=sr)))
S_filter = np.minimum(S_full, S_filter)

margin_i, margin_v = 2, 10
power = 2

mask_v = librosa.util.softmask(S_full - S_filter,
                               margin_v * S_filter,
                               power=power)

S_foreground = mask_v * S_full

full = librosa.amplitude_to_db(S_full, ref=np.max)
librosa.display.specshow(full, y_axis='log', sr=sr)

plt.title('Full spectrum')
plt.colorbar()

plt.tight_layout()
plt.show()

print("y({}): {}".format(len(y),y))
print("sr: {}".format(sr))

full_audio = librosa.istft(S_full)
foreground_audio = librosa.istft(S_foreground)
print("full({}): {}".format(len(full_audio), full_audio))

soundfile.write('orig.WAV', y, sr) 
soundfile.write('full.WAV', full_audio, sr) 
soundfile.write('foreground.WAV', foreground_audio, sr) 

这篇关于使用librosa隔离音频前景并转换回音频流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆