在音频分析中绘制频谱图 [英] plotting spectrogram in audio analysis

查看:1181
本文介绍了在音频分析中绘制频谱图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用神经网络进行语音识别.为此,我需要获取那些训练音频文件(.wav)的声谱图.如何在python中获取这些频谱图?

I am working on speech recognition using neural network. To do so I need to get the spectrograms of those training audio files (.wav) . How to get those spectrograms in python ?

推荐答案

有很多方法可以做到这一点.最简单的方法是检查 TensorFlow语音认可挑战(仅按投票数排序). 特别清晰和简单,其中包含以下内容功能.输入是从wav文件提取的样本的数值矢量,采样率,以毫秒为单位的帧大小,以毫秒为单位的步长(跨步或跳过)大小和较小的偏移量.

There are numerous ways to do so. The easiest is to check out the methods proposed in Kernels on Kaggle competition TensorFlow Speech Recognition Challenge (just sort by most voted). This one is particularly clear and simple and contains the following function. The input is a numeric vector of samples extracted from the wav file, the sample rate, the size of the frame in milliseconds, the step (stride or skip) size in milliseconds and a small offset.

from scipy.io import wavfile
from scipy import signal
import numpy as np

sample_rate, audio = wavfile.read(path_to_wav_file)

def log_specgram(audio, sample_rate, window_size=20,
                 step_size=10, eps=1e-10):
    nperseg = int(round(window_size * sample_rate / 1e3))
    noverlap = int(round(step_size * sample_rate / 1e3))
    freqs, times, spec = signal.spectrogram(audio,
                                    fs=sample_rate,
                                    window='hann',
                                    nperseg=nperseg,
                                    noverlap=noverlap,
                                    detrend=False)
    return freqs, times, np.log(spec.T.astype(np.float32) + eps)

输出在 SciPy手册,但频谱图是通过单调函数(Log())重新定标的,该函数将较大的值压制得比较小的值要大得多,而较大的值仍比较小的值大.这样,规格中没有极值将主导计算.或者,可以将值限制在某个分位数上,但是对数(甚至平方根)是首选.还有许多其他方法可以标准化频谱图的高度,即防止极值欺负"输出:)

Outputs are defined in the SciPy manual, with an exception that the spectrogram is rescaled with a monotonic function (Log()), which depresses larger values much more than smaller values, while leaving the larger values still larger than the smaller values. This way no extreme value in spec will dominate the computation. Alternatively, one can cap the values at some quantile, but log (or even square root) are preferred. There are many other ways to normalize the heights of the spectrogram, i.e. to prevent extreme values from "bullying" the output :)

freq (f) : ndarray, Array of sample frequencies.
times (t) : ndarray, Array of segment times.
spec (Sxx) : ndarray, Spectrogram of x. By default, the last axis of Sxx corresponds to the segment times.

或者,您可以在有关音频识别的Tensorflow示例中的noreferrer> github存储库.

Alternatively, you can check the train.py and models.py code on github repo from the Tensorflow example on audio recognition.

这是另一个主题,它解释并给出了用Python构建频谱图的代码.

Here is another thread that explains and gives code on building spectrograms in Python.

这篇关于在音频分析中绘制频谱图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆