如何应用二进制掩码和STFT来生成音频文件? [英] How do I apply a binary mask and STFT to produce an audio file?

查看:128
本文介绍了如何应用二进制掩码和STFT来生成音频文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这就是想法:您可以使用短时傅立叶变换(stft)从音频文件生成频谱图.然后,有些人生成了一种称为二进制掩码"的东西,以从反stft生成不同的音频(即,去除了背景噪声等).

So here's the idea: you can generate a spectrogram from an audio file using shorttime Fourier transform (stft). Then some people have generated something called a "binary mask" to generate different audio (ie. with background noise removed etc.) from the inverse stft.

这是我的理解:

stft是一个应用于音频文件的简单公式,该公式生成可以轻松显示为频谱图的信息. 通过获取stft矩阵的逆矩阵,并将其乘以相同大小的矩阵(二进制矩阵),您可以创建一个包含信息的新矩阵,以生成带有被掩盖声音的音频文件.

stft is a simple equation that is applied to the audio file, which generates the information that can easily be displayed a spectrogram. By taking the inverse of the stft matrix, and multiplying it by a matrix of the same size (the binary matrix) you can create a new matrix with information to generate an audio file with the masked sound.

一旦完成矩阵乘法,将如何创建新的音频文件?

虽然不多,但是这是我在代码方面得到的:

It's not much but here's what I've got in terms of code:

from librosa import load
from librosa.core import stft, istft
y, sample_rate = load('1.wav')
spectrum = stft(y)
back_y = istft(spectrum)

谢谢,这里有一些幻灯片让我走了这么远.如果您能在python中给我一个示例/演示,我将不胜感激.

Thank you, and here are some slides that got me this far. I'd appreciate it if you could give me an example/demo in python

推荐答案

Librosa的STFT具有全功能,因此,除非您非常谨慎地操作频谱,否则您不会从其istft处获得有意义的输出.

Librosa's STFT is full-featured so unless you're very careful with how you manipulate the spectrum, you won't get a sensible output from its istft.

这是我从头开始编写的代表正向和反向STFT的一对函数stftistft,以及一个辅助方法,该方法可以为您提供STFT阵列中每个像素的时间和频率位置,再加上一个演示:

Here's a pair of functions, stft and istft, that I wrote from scratch that represent the forward and inverse STFT, along with a helper method that gives you the time and frequency locations of each pixel in the STFT array, plus a demo:

import numpy as np
import numpy.fft as fft


def stft(x, Nwin, Nfft=None):
    """
    Short-time Fourier transform: convert a 1D vector to a 2D array

    The short-time Fourier transform (STFT) breaks a long vector into disjoint
    chunks (no overlap) and runs an FFT (Fast Fourier Transform) on each chunk.

    The resulting 2D array can 

    Parameters
    ----------
    x : array_like
        Input signal (expected to be real)
    Nwin : int
        Length of each window (chunk of the signal). Should be ≪ `len(x)`.
    Nfft : int, optional
        Zero-pad each chunk to this length before FFT. Should be ≥ `Nwin`,
        (usually with small prime factors, for fastest FFT). Default: `Nwin`.

    Returns
    -------
    out : complex ndarray
        `len(x) // Nwin` by `Nfft` complex array representing the STFT of `x`.

    See also
    --------
    istft : inverse function (convert a STFT array back to a data vector)
    stftbins : time and frequency bins corresponding to `out`
    """
    Nfft = Nfft or Nwin
    Nwindows = x.size // Nwin
    # reshape into array `Nwin` wide, and as tall as possible. This is
    # optimized for C-order (row-major) layouts.
    arr = np.reshape(x[:Nwindows * Nwin], (-1, Nwin))
    stft = fft.rfft(arr, Nfft)
    return stft


def stftbins(x, Nwin, Nfft=None, d=1.0):
    """
    Time and frequency bins corresponding to short-time Fourier transform.

    Call this with the same arguments as `stft`, plus one extra argument: `d`
    sample spacing, to get the time and frequency axes that the output of
    `stft` correspond to.

    Parameters
    ----------
    x : array_like
        same as `stft`
    Nwin : int
        same as `stft`
    Nfft : int, optional
        same as `stft`
    d : float, optional
        Sample spacing of `x` (or 1 / sample frequency), units of seconds.
        Default: 1.0.

    Returns
    -------
    t : ndarray
        Array of length `len(x) // Nwin`, in units of seconds, corresponding to
        the first dimension (height) of the output of `stft`.
    f : ndarray
        Array of length `Nfft`, in units of Hertz, corresponding to the second
        dimension (width) of the output of `stft`.
    """
    Nfft = Nfft or Nwin
    Nwindows = x.size // Nwin
    t = np.arange(Nwindows) * (Nwin * d)
    f = fft.rfftfreq(Nfft, d)
    return t, f


def istft(stftArr, Nwin):
    """
    Inverse short-time Fourier transform (ISTFT)

    Given an array representing the output of `stft`, convert it back to the
    original samples.

    Parameters
    ----------
    stftArr : ndarray
        Output of `stft` (or something the same size)
    Nwin : int
        Same input as `stft`: length of each chunk that the STFT was calculated
        over.

    Returns
    -------
    y : ndarray
        Data samples corresponding to STFT data.

    See also:
    stft : the forward transform
    """
    arr = fft.irfft(stftArr)[:, :Nwin]
    return np.reshape(arr, -1)


if __name__ == '__main__':
    sampleRate = 100.0  # Hertz
    N = 1024
    Nwin = 64

    # Generate a chirp: start frequency at 5 Hz and going down at 2 Hz/s
    time = np.arange(N) / sampleRate  # seconds
    x = np.cos(2 * np.pi * time * (5 - 2 * 0.5 * time))

    # Test with Nfft bigger than Nwin
    Nfft = Nwin * 2
    s = stft(x, Nwin, Nfft=Nfft)
    y = istft(s, Nwin)

    # Make sure the stft and istft are inverses. Caveat: `x` and `y` won't be
    # the same length if `N/Nwin` isn't integral!
    maxerr = np.max(np.abs(x - y))
    assert (maxerr < np.spacing(1) * 10)

    # Test `stftbins`
    t, f = stftbins(x, Nwin, Nfft=Nfft, d=1 / sampleRate)
    assert (len(t) == s.shape[0])
    assert (len(f) == s.shape[1])

    try:
        import pylab as plt
        plt.imshow(np.abs(s), aspect="auto", extent=[f[0], f[-1], t[-1], t[0]])
        plt.xlabel('frequency (Hertz)')
        plt.ylabel('time (seconds (start of chunk))')
        plt.title('STFT with chirp example')
        plt.show()
    except ModuleNotFoundError:
        pass

如果您更容易阅读,则在要点中.

This is in a gist if that's easier for you to read.

整个模块采用纯数据,并使用Numpy的rfft函数.您可以肯定地将其概括为复杂的数据(或使用librosa),但是对于您的应用程序(音频屏蔽),使用仅实数转换可以更轻松地确保一切正常,并且逆STFT的输出为实数(如果您使用的是通用的复杂STFT,则很容易搞砸,在这种情况下,您需要小心保持对称性.

The entire module assumes real-only data and uses Numpy's rfft functions. You can definitely generalize this to complex data (or use librosa), but for your application (audio masking), using the real-only transforms makes it easier to ensure that everything works out and the output of the inverse STFT is real-only (it's easy to mess this up if you're doing the fully-general complex STFT, where you need to be careful in maintaining symmetries).

该演示首先生成一些测试数据,并确认该数据stft上的istft再次生成该数据.测试数据是一个线性调频脉冲,从5 Hz开始,以每秒2 Hz的速度下降,因此在大约10秒钟的数据中,线性调频的频率回绕并最终达到15 Hz.该演示绘制了STFT(通过获取STFT数组的绝对值):

The demo first generates some test data and confirms that the istft on the stft of the data produces the data again. The test data is a chirp that starts at 5 Hz and goes down at 2 Hz per second, so over ~10 seconds of data, the chirp's frequency wraps around and ends up at around 15 Hz. The demo plots the STFT (by taking the absolute value of the STFT array):

所以

  1. 将此代码放入stft.py文件
  2. 将其导入为import stft
  3. 将STFT计算为spectrum = stft.stft(y, 128)
  4. 如演示中所示可视化您的频谱(请务必在stft.py中定义的功能之前添加stft.!)
  5. 先选择要衰减/放大的频率,然后将这些效果应用到spectrum阵列上
  6. 最终通过back_y = stft.istft(spectrum, 128)获取处理后的音频.
  1. put this code in a stft.py file,
  2. import it as import stft,
  3. compute an STFT as spectrum = stft.stft(y, 128),
  4. visualize your spectrum as shown in the demo (be sure to prepend stft. to functions defined in stft.py!),
  5. pick what frequencies you want to attenuate/amplify and apply those effects on the spectrum array, before
  6. finally getting the processed audio via back_y = stft.istft(spectrum, 128).

屏蔽/放大/衰减频率内容意味着仅缩放spectrum阵列的某些bin.如果您对此有具体疑问,请告诉我们.但这有望为您提供一种应用任意效果的简便方法.

Masking/amplifying/attenuating frequency content means just scaling some bins of the spectrum array. If you have specific questions on how to do that, let us know. But this hopefully will give you a foolproof way of applying arbitrary effects.

如果您真的想使用librosa的功能,请告诉我们,我们也可以向您展示如何做到这一点.

If you really want to use librosa's functions, let us know and we can show you how to do that too.

这篇关于如何应用二进制掩码和STFT来生成音频文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆