分析"&哨QUOT;响俯仰/笔记 [英] Analyze "whistle" sound for pitch/note

查看:167
本文介绍了分析"&哨QUOT;响俯仰/笔记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想建立一个系统,将能够处理的人吹口哨和输出笔记记录。

I am trying to build a system that will be able to process a record of someone whistling and output notes.

谁能推荐一个开源的平台,我可以为基地的音符/音高的识别和波形文件的分析使用?

Can anyone recommend an open-source platform which I can use as the base for the note/pitch recognition and analysis of wave files ?

在此先感谢

推荐答案

正如许多人已经说了,FFT是去这里的方式。我已经使用FFT code从 http://www.cs.princeton.edu用Java编写的一个小例子/ introcs / 97data / 。为了运行它,你将需要从该页面复杂类也(见确切URL来源)。

As many others have already said, FFT is the way to go here. I've written a little example in Java using FFT code from http://www.cs.princeton.edu/introcs/97data/. In order to run it, you will need the Complex class from that page also (see the source for the exact URL).

在code在文件中读取,去窗口明智过它,而且每个窗口上的FFT。为每个FFT它看起来对于最大系数,并输出相应的频率。这样确实非常​​好干净的信号就像一个正弦波,但实际的鸣笛声,你可能不得不增加更多。我测试过的几个文件,吹口哨我创造了我自己(用我的笔记本电脑的集成麦克风)时,code确实得到了什么事情的想法,但为了得到实际音符,更需要做的事情

The code reads in a file, goes window-wise over it and does an FFT on each window. For each FFT it looks for the maximum coefficient and outputs the corresponding frequency. This does work very well for clean signals like a sine wave, but for an actual whistle sound you probably have to add more. I've tested with a few files with whistling I created myself (using the integrated mic of my laptop computer), the code does get the idea of what's going on, but in order to get actual notes more needs to be done.

1),你可能需要一些更智能窗口技术。我什么code现在采用的是一个简单的矩形窗口。由于在FFT假定输入葛可以周期性持续,当第一个和最后样本中的窗口不匹配被检测到额外的频率。这被称为频谱泄漏(<一href=\"http://en.wikipedia.org/wiki/Spectral_leakage\">http://en.wikipedia.org/wiki/Spectral_leakage ),通常是使用一个窗口,在开始下行加权样品和窗口的结束(的http://en.wikipedia。组织/维基/ Window_function )。虽然泄漏不会引起错误的频率,使用一个窗口会增加检测质量被检测为最大值,

1) You might need some more intelligent window technique. What my code uses now is a simple rectangular window. Since the FFT assumes that the input singal can be periodically continued, additional frequencies are detected when the first and the last sample in the window don't match. This is known as spectral leakage ( http://en.wikipedia.org/wiki/Spectral_leakage ), usually one uses a window that down-weights samples at the beginning and the end of the window ( http://en.wikipedia.org/wiki/Window_function ). Although the leakage shouldn't cause the wrong frequency to be detected as the maximum, using a window will increase the detection quality.

2)要匹配的频率,以实际的注释,则可以使用含有该频率的阵列(如440赫兹为a')和然后寻找最接近于已被识别的一个的频率。但是,如果是呼啸折的调整,这是不行的了。鉴于呼啸仍然是正确的,但只调整不同(如吉他或其他乐器,可以进行不同的调整和声音仍然好,只要调整为所有字符串始终做),你仍然可以通过查找笔记在所识别的频率的比率。您可以阅读<一href=\"http://en.wikipedia.org/wiki/Pitch_%28music%29\">http://en.wikipedia.org/wiki/Pitch_%28music%29作为一个起点。这也是有趣的:<一href=\"http://en.wikipedia.org/wiki/Piano_key_frequencies\">http://en.wikipedia.org/wiki/Piano_key_frequencies

2) To match the frequencies to actual notes, you could use an array containing the frequencies (like 440 Hz for a') and then look for the frequency that's closest to the one that has been identified. However, if the whistling is off standard tuning, this won't work any more. Given that the whistling is still correct but only tuned differently (like a guitar or other musical instrument can be tuned differently and still sound "good", as long as the tuning is done consistently for all strings), you could still find notes by looking at the ratios of the identified frequencies. You can read http://en.wikipedia.org/wiki/Pitch_%28music%29 as a starting point on that. This is also interesting: http://en.wikipedia.org/wiki/Piano_key_frequencies

3)此外,当每个单独的音调启动和停止这可能是有趣的时间来检测点。这可以被添加作为一个$ P $对 - 处理步骤。你可以做的每个单独的音符的FFT即可。但是,如果哨声并没有停止,而是音符之间只是弯曲,这不会是那么容易。

3) Moreover it might be interesting to detect the points in time when each individual tone starts and stops. This could be added as a pre-processing step. You could do an FFT for each individual note then. However, if the whistler doesn't stop but just bends between notes, this would not be that easy.

绝对看看别人建议的库。我不知道任何人,但也许他们已经包含功能做什么我上面描述。

Definitely have a look at the libraries the others suggested. I don't know any of them, but maybe they contain already functionality for doing what I've described above.

现在到了code。请让我知道什么对你的工作,我觉得这个题目pretty有趣。

And now to the code. Please let me know what worked for you, I find this topic pretty interesting.

编辑:我更新了code,包括重叠和频率的简单映射到音符。它仅适用于调谐哨声虽然,如上所述

I updated the code to include overlapping and a simple mapper from frequencies to notes. It works only for "tuned" whistlers though, as mentioned above.

package de.ahans.playground;

import java.io.File;
import java.io.IOException;
import java.util.Arrays;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;

public class FftMaxFrequency {

    // taken from http://www.cs.princeton.edu/introcs/97data/FFT.java.html
    // (first hit in Google for "java fft"
    // needs Complex class from http://www.cs.princeton.edu/introcs/97data/Complex.java
    public static Complex[] fft(Complex[] x) {
        int N = x.length;

        // base case
        if (N == 1) return new Complex[] { x[0] };

        // radix 2 Cooley-Tukey FFT
        if (N % 2 != 0) { throw new RuntimeException("N is not a power of 2"); }

        // fft of even terms
        Complex[] even = new Complex[N/2];
        for (int k = 0; k < N/2; k++) {
            even[k] = x[2*k];
        }
        Complex[] q = fft(even);

        // fft of odd terms
        Complex[] odd  = even;  // reuse the array
        for (int k = 0; k < N/2; k++) {
            odd[k] = x[2*k + 1];
        }
        Complex[] r = fft(odd);

        // combine
        Complex[] y = new Complex[N];
        for (int k = 0; k < N/2; k++) {
            double kth = -2 * k * Math.PI / N;
            Complex wk = new Complex(Math.cos(kth), Math.sin(kth));
            y[k]       = q[k].plus(wk.times(r[k]));
            y[k + N/2] = q[k].minus(wk.times(r[k]));
        }
        return y;
    }   

    static class AudioReader {
        private AudioFormat audioFormat;

        public AudioReader() {}

        public double[] readAudioData(File file) throws UnsupportedAudioFileException, IOException {
            AudioInputStream in = AudioSystem.getAudioInputStream(file);
            audioFormat = in.getFormat();
            int depth = audioFormat.getSampleSizeInBits();
            long length = in.getFrameLength();
            if (audioFormat.isBigEndian()) {
                throw new UnsupportedAudioFileException("big endian not supported");
            }
            if (audioFormat.getChannels() != 1) {
                throw new UnsupportedAudioFileException("only 1 channel supported");
            }

            byte[] tmp = new byte[(int) length];
            byte[] samples = null;      
            int bytesPerSample = depth/8;
            int bytesRead;
            while (-1 != (bytesRead = in.read(tmp))) {
                if (samples == null) {
                    samples = Arrays.copyOf(tmp, bytesRead);
                } else {
                    int oldLen = samples.length;
                    samples = Arrays.copyOf(samples, oldLen + bytesRead);
                    for (int i = 0; i < bytesRead; i++) samples[oldLen+i] = tmp[i];
                }
            }

            double[] data = new double[samples.length/bytesPerSample];

            for (int i = 0; i < samples.length-bytesPerSample; i += bytesPerSample) {
                int sample = 0;
                for (int j = 0; j < bytesPerSample; j++) sample += samples[i+j] << j*8;
                data[i/bytesPerSample] = (double) sample / Math.pow(2, depth);
            }

            return data;
        }

        public AudioFormat getAudioFormat() {
            return audioFormat;
        }
    }

    public class FrequencyNoteMapper {
        private final String[] NOTE_NAMES = new String[] {
                "A", "Bb", "B", "C", "C#", "D", "D#", "E", "F", "F#", "G", "G#"
            };
        private final double[] FREQUENCIES;
        private final double a = 440;
        private final int TOTAL_OCTAVES = 6;
        private final int START_OCTAVE = -1; // relative to A

        public FrequencyNoteMapper() {
            FREQUENCIES = new double[TOTAL_OCTAVES*12];
            int j = 0;
            for (int octave = START_OCTAVE; octave < START_OCTAVE+TOTAL_OCTAVES; octave++) {
                for (int note = 0; note < 12; note++) {
                    int i = octave*12+note;
                    FREQUENCIES[j++] = a * Math.pow(2, (double)i / 12.0);
                }
            }
        }

        public String findMatch(double frequency) {
            if (frequency == 0)
                return "none";

            double minDistance = Double.MAX_VALUE;
            int bestIdx = -1;

            for (int i = 0; i < FREQUENCIES.length; i++) {
                if (Math.abs(FREQUENCIES[i] - frequency) < minDistance) {
                    minDistance = Math.abs(FREQUENCIES[i] - frequency);
                    bestIdx = i;
                }
            }

            int octave = bestIdx / 12;
            int note = bestIdx % 12;

            return NOTE_NAMES[note] + octave;
        }
    }

    public void run (File file) throws UnsupportedAudioFileException, IOException {
        FrequencyNoteMapper mapper = new FrequencyNoteMapper();

        // size of window for FFT
        int N = 4096;
        int overlap = 1024;
        AudioReader reader = new AudioReader(); 
        double[] data = reader.readAudioData(file);

        // sample rate is needed to calculate actual frequencies
        float rate = reader.getAudioFormat().getSampleRate();

        // go over the samples window-wise
        for (int offset = 0; offset < data.length-N; offset += (N-overlap)) {
            // for each window calculate the FFT
            Complex[] x = new Complex[N];
            for (int i = 0; i < N; i++) x[i] = new Complex(data[offset+i], 0);
            Complex[] result = fft(x);

            // find index of maximum coefficient
            double max = -1;
            int maxIdx = 0;
            for (int i = result.length/2; i >= 0; i--) {
                if (result[i].abs() > max) {
                    max = result[i].abs();
                    maxIdx = i;
                }
            }
            // calculate the frequency of that coefficient
            double peakFrequency = (double)maxIdx*rate/(double)N;
            // and get the time of the start and end position of the current window
            double windowBegin = offset/rate;
            double windowEnd = (offset+(N-overlap))/rate;
            System.out.printf("%f s to %f s:\t%f Hz -- %s\n", windowBegin, windowEnd, peakFrequency, mapper.findMatch(peakFrequency));
        }       
    }

    public static void main(String[] args) throws UnsupportedAudioFileException, IOException {
        new FftMaxFrequency().run(new File("/home/axr/tmp/entchen.wav"));
    }
}

这篇关于分析&QUOT;&哨QUOT;响俯仰/笔记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆