如何合并MFCC [英] How to Merge MFCCs

查看:154
本文介绍了如何合并MFCC的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从一些音频文件中提取MFCC功能.我目前使用的程序为每个文件提取了一系列MFCC,并且其缓冲区大小为1024.在论文中,我看到了以下内容:

I am working on extracting MFCC features from some audio files. The program I have currently extracts a series of MFCCs for each file and has a parameter of a buffer size of 1024. I saw the following in a paper:

在一秒钟的音频数据中提取的特征向量通过计算每个特征向量元素的均值和方差进行合并(合并).

The feature vectors extracted within a second of audio data are combined by computing the mean and the variance of each feature vector element (merging).

我当前的代码使用TarsosDSP提取MFCC,但是我不确定如何将数据拆分为一秒钟的音频数据"以合并MFCC.

My current code uses TarsosDSP to extract the MFCCs, but I'm not sure how to split the data into "a second of audio data" in order to merge the MFCCs.

int sampleRate = 44100;
int bufferSize = 1024;
int bufferOverlap = 512;
inStream = new FileInputStream(path);
AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(inStream, new TarsosDSPAudioFormat(sampleRate, 16, 1, true, true)), bufferSize, bufferOverlap);
final MFCC mfcc = new MFCC(bufferSize, sampleRate, 13, 40, 300, 3000);
dispatcher.addAudioProcessor(mfcc);
dispatcher.addAudioProcessor(new AudioProcessor() {
    @Override
    public void processingFinished() {
        System.out.println("DONE");
    }
    @Override
    public boolean process(AudioEvent audioEvent) {
        return true;  // breakpoint here reveals MFCC data
    }
});
dispatcher.run();

缓冲区的大小到底是什么?它可以用于将音频分割为1秒的窗口吗?有没有一种方法可以将一系列MFCC划分为一定的时间?

What exactly is buffer size and could it be used to segment the audio into windows of 1 second? Is there a method to divide the series of MFCCs into certain amounts of time?

任何帮助将不胜感激.

推荐答案

经过更多研究,我遇到了

After more research, I came across this website that clearly showed steps in using MFCCs for Weka. It showed some data files with various statistics each listed as separate attributes in Weka. I believe when the paper said

计算均值和方差

computing the mean and variance

它们表示每个MFCC系数的均值和方差均用作组合数据文件中的属性.当我按照网站上的示例合并MFCC时,我使用了最大,最小,范围,最大位置,最小位置,均值,标准差,偏度,峰度,四分位数和四分位数间距.

they meant the mean and variance of each MFCC coefficient were used as attributes in the combined data file. When I followed the example on the website to merge the MFCCs, I used max, min, range, max position, min position, mean, standard deviation, skewness, kurtosis, quartile, and interquartile range.

要将音频输入分成几秒钟,我相信MFCC的集合是以作为参数输入的采样率提取的,因此,如果将其设置为100,我将等待100个周期来合并MFCC.如果我错了,请纠正我.

To split the audio input into seconds, I believe sets are of MFCCs are extracted at the sample rate inputted as the parameter, so if I set it to 100, I would wait for 100 cycles to merge the MFCCs. Please correct me if I'm wrong.

这篇关于如何合并MFCC的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆