我可以将使用 librosa 生成的频谱图转换回音频吗? [英] Can I convert spectrograms generated with librosa back to audio?

查看:48
本文介绍了我可以将使用 librosa 生成的频谱图转换回音频吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将一些音频文件转换为频谱图并使用以下代码将它们保存到文件中:

导入操作系统从 matplotlib 导入 pyplot 作为 plt导入 librosa导入 librosa.display将 IPython.display 导入为 ipdaudio_fpath = "./audios/"spectrograms_path = "./spectrograms/"音频剪辑 = os.listdir(audio_fpath)def generate_spectrogram(x, sr, save_name):X = librosa.stft(x)Xdb = librosa.amplitude_to_db(abs(X))fig = plt.figure(figsize=(20, 20), dpi=1000, frameon=False)ax = fig.add_axes([0, 0, 1, 1], frameon=False)ax.axis('关闭')librosa.display.specshow(Xdb, sr=sr, cmap='gray', x_axis='time', y_axis='hz'​​)plt.savefig(save_name, quality=100, bbox_inches=0, pad_inches=0)librosa.cache.clear()对于音频剪辑中的我:audio_fpath = "./audios/"spectrograms_path = "./spectrograms/"audio_length = librosa.get_duration(文件名=audio_fpath + i)j=60当 j <音频长度:x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)save_name = spectrograms_path + i + str(j) + ".jpg"generate_spectrogram(x,sr,save_name)j += 60如果 j >= audio_length:j = 音频长度x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)save_name = spectrograms_path + i + str(j) + ".jpg"generate_spectrogram(x,sr,save_name)

我想保留音频中的大部分细节和质量,以便我可以将它们恢复为音频而不会造成太多损失(每个为 80MB).

是否可以将它们恢复为音频文件?我该怎么做?

我尝试使用 librosa.feature.inverse.mel_to_audio,但没有用,而且我认为它不适用.

我现在有 1300 个频谱图文件,想用它们训练一个生成对抗网络,这样我就可以生成新的音频,但如果我以后不能听结果,我不想这样做.

解决方案

是的,可以恢复大部分信号并估计相位,例如Griffin-Lim 算法 (GLA).它对 Python 的快速"实现可以在

默认情况下,算法会随机初始化相位,然后迭代前向和逆向 STFT 操作以估计相位.

查看您的代码,要重建信号,您只需要执行以下操作:

将 numpy 导入为 npX_inv = librosa.griffinlim(np.abs(X))

这当然只是一个例子.正如@PaulR 所指出的,在您的情况下,您需要从 jpeg(这是有损的!)加载数据,然后首先对 amplitude_to_db 应用逆变换.

由于人工神经网络的进步,可以进一步改进算法,尤其是相位估计.这里是一篇讨论一些增强功能的论文.

I converted some audio files to spectrograms and saved them to files using the following code:

import os
from matplotlib import pyplot as plt
import librosa
import librosa.display
import IPython.display as ipd

audio_fpath = "./audios/"
spectrograms_path = "./spectrograms/"
audio_clips = os.listdir(audio_fpath)

def generate_spectrogram(x, sr, save_name):
    X = librosa.stft(x)
    Xdb = librosa.amplitude_to_db(abs(X))
    fig = plt.figure(figsize=(20, 20), dpi=1000, frameon=False)
    ax = fig.add_axes([0, 0, 1, 1], frameon=False)
    ax.axis('off')
    librosa.display.specshow(Xdb, sr=sr, cmap='gray', x_axis='time', y_axis='hz')
    plt.savefig(save_name, quality=100, bbox_inches=0, pad_inches=0)
    librosa.cache.clear()

for i in audio_clips:
    audio_fpath = "./audios/"
    spectrograms_path = "./spectrograms/"
    audio_length = librosa.get_duration(filename=audio_fpath + i)
    j=60
    while j < audio_length:
        x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
        save_name = spectrograms_path + i + str(j) + ".jpg"
        generate_spectrogram(x, sr, save_name)
        j += 60
        if j >= audio_length:
            j = audio_length
            x, sr = librosa.load(audio_fpath + i, offset=j-60, duration=60)
            save_name = spectrograms_path + i + str(j) + ".jpg"
            generate_spectrogram(x, sr, save_name)

I wanted to keep the most detail and quality from the audios, so that i could turn them back to audio without too much loss (They are 80MB each).

Is it possible to turn them back to audio files? How can I do it?

I tried using librosa.feature.inverse.mel_to_audio, but it didn't work, and I don't think it applies.

I now have 1300 spectrogram files and want to train a Generative Adversarial Network with them, so that I can generate new audios, but I don't want to do it if i wont be able to listen to the results later.

解决方案

Yes, it is possible to recover most of the signal and estimate the phase with e.g. Griffin-Lim Algorithm (GLA). Its "fast" implementation for Python can be found in librosa. Here's how you can use it:

import numpy as np
import librosa

y, sr = librosa.load(librosa.util.example_audio_file(), duration=10)
S = np.abs(librosa.stft(y))
y_inv = librosa.griffinlim(S)

And that's how the original and reconstruction look like:

The algorithm by default randomly initialises the phases and then iterates forward and inverse STFT operations to estimate the phases.

Looking at your code, to reconstruct the signal, you'd just need to do:

import numpy as np

X_inv = librosa.griffinlim(np.abs(X))

It's just an example of course. As pointed out by @PaulR, in your case you'd need to load the data from jpeg (which is lossy!) and then apply inverse transform to amplitude_to_db first.

The algorithm, especially the phase estimation, can be further improved thanks to advances in artificial neural networks. Here is one paper that discusses some enhancements.

这篇关于我可以将使用 librosa 生成的频谱图转换回音频吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆