我的 librosa MFCC 输出是否正确?我想我在使用 librosa MFCC 时得到了错误的帧数 [英] Is my output of librosa MFCC correct? I think I get the wrong number of frames when using librosa MFCC

查看:40
本文介绍了我的 librosa MFCC 输出是否正确?我想我在使用 librosa MFCC 时得到了错误的帧数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

result=librosa.feature.mfcc(信号,16000,n_mfcc=13,n_fft=2048,hop_length=400)结果.shape()

信号长 1 秒,采样率为 16000,我计算了 13 MFCC,跳长为 400.输出维度为 (13,41).为什么我得到41帧,不应该是(time*sr/hop_length)=40吗?

解决方案

TL;DR answer

是的,这是正确的.

长答案

您使用时间序列作为输入(signal),这意味着 librosa 首先使用 melspectrogram 函数.它需要一堆参数,您已经指定了其中的一个 (n_fft).重要的是要注意 melspectrogram还提供了两个参数 centerpad_mode,分别具有默认值 Truereflect".>

来自文档:

<块引用>

pad_mode:字符串:如果 center=True,则在信号边缘使用的填充模式.默认情况下,STFT 使用反射填充.

<块引用>

中心:布尔值:如果为 True,则填充信号 y,以便帧 t 以 y[t * hop_length] 为中心.如果为 False,则帧 t 从 y[t * hop_length]

开始

换句话说,默认情况下,librosa 使您的信号更长(垫)以支持居中.

如果您想避免这种行为,您应该将 center=False 传递给您的 mfcc 调用.

总之,当将 center 设置为 False 时,请记住,n_fft 长度为 2048,跳跃长度为 400,您不一定会得到 (time*sr/hop_length)=40 帧,因为您还必须考虑 window 而不是只是 hop 长度(除非你以某种方式填充).跳跃长度仅由您移动该窗口的样本数指定.

举一个极端的例子,考虑一个非常大的窗口和一个非常短的跳跃长度:假设 10 个样本(例如 time=1s, sr=10Hz),一个n_fft=9hop_length=1 的窗口长度,center=False.现在想象在 10 个样本上滑动窗口.

 ◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◻︎◻︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎电话 0123456789◻︎ 样本未被窗口覆盖◼︎ 样品被窗口覆盖

首先窗口从 t=0 开始,到 t=8 结束.我们可以将它移动多少次 hop_length 并且仍然期望它不会用完样本?恰好一次,直到它从 t=1 开始并在 t=9 结束.添加第一个未移动的,您会到达 2 帧.这明显不同于错误的(time*sr/hop_length)=1*10/1=10.

正确的应该是:(time*sr-n_fft)//hop_length+1=(1*10-9)//1+1=2 with // 表示 Python 风格的整数除法.

使用默认值时,即center=True,信号在两端填充n_fft//2个样本,所以n_fft不在等式中.

result=librosa.feature.mfcc(signal, 16000, n_mfcc=13, n_fft=2048, hop_length=400)
result.shape()

The signal is 1 second long with sampling rate of 16000, I compute 13 MFCC with 400 hop length. The output dimensions are (13,41). Why do I get 41 frames, isn't it supposed to be (time*sr/hop_length)=40?

解决方案

TL;DR answer

Yes, it is correct.

Long answer

You are using a time-series as input (signal), which means that librosa first computes a mel spectrogram using the melspectrogram function. It takes a bunch of arguments, of which you have already specified one (n_fft). It's important to note that melspectrogram also offers the two parameters center and pad_mode with the default values True and "reflect" respectively.

From the docs:

pad_mode: string: If center=True, the padding mode to use at the edges of the signal. By default, STFT uses reflection padding.

center: boolean: If True, the signal y is padded so that frame t is centered at y[t * hop_length]. If False, then frame t begins at y[t * hop_length]

In other words, by default, librosa makes your signal longer (pads) in order to support centering.

If you'd like to avoid this behavior, you should to pass center=False to your mfcc call.

That all said, when setting center to False, keep in mind that with an n_fft length of 2048 and a hop length of 400, you don't necessarily get (time*sr/hop_length)=40 frames, because you have to also account for the window and not just the hop length (unless you pad somehow). Hop length just specifies by how many samples you move that window.

To give an extreme example, consider a very large window and a very short hop length: Assume 10 samples (e.g. time=1s, sr=10Hz), a window length of n_fft=9 and hop_length=1 with center=False. Now imagine sliding the window over the 10 samples.

   ◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◻︎
   ◻︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎◼︎
t  0123456789

◻︎ sample not covered by window
◼︎ sample covered by window

At first the window starts at t=0 and ends at t=8. How many times can we shift it by hop_length and still expect it to not run out of samples? Exactly once, until it starts at t=1 and ends at t=9. Add the first unshifted one and you arrive at 2 frames. This is obviously different from the incorrect (time*sr/hop_length)=1*10/1=10.

Correct would be: (time*sr-n_fft)//hop_length+1=(1*10-9)//1+1=2 with // denoting Python-style integer division.

When using the default, i.e. center=True, the signal is padded with n_fft // 2 samples on both ends, so n_fft falls out of the equation.

这篇关于我的 librosa MFCC 输出是否正确?我想我在使用 librosa MFCC 时得到了错误的帧数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆