Java:如何获取当前音频输入频率? [英] Java: How to get current frequency of audio input?

查看:141
本文介绍了Java:如何获取当前音频输入频率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想分析麦克风输入的当前频率,以使我的LED与音乐播放同步.我知道如何从麦克风捕获声音,但是我不了解FFT,在寻找解决方案以获取频率时我经常会看到它.

我要测试某个频率的当前音量是否大于设定值.该代码应该看起来像这样:

 if(frequency > value) { 
   LEDs on
 else {
   LEDs off
 }

我的问题是如何在Java中实现FFT.为了更好地理解,此处是指向YouTube视频的链接,该链接确实显示了很好,我正在尝试实现.

整个代码:

public class Music {

    static AudioFormat format;
    static DataLine.Info info;

    public static void input() {
        format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, false);

        try {
            info = new DataLine.Info(TargetDataLine.class, format);
            final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
            targetLine.open();

            AudioInputStream audioStream = new AudioInputStream(targetLine);

            byte[] buf = new byte[256]

            Thread targetThread = new Thread() {
                public void run() {
                    targetLine.start();
                    try {
                        audioStream.read(buf);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            };

            targetThread.start();
    } catch (LineUnavailableException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

我尝试使用MediaPlayer的JavaFX AudioSpectrumListener,只要使用.mp3文件,它的效果就很好.问题是,我必须使用一个字节数组来存储麦克风输入.我为此问题又问了一个问题这里.

解决方案

使用JavaFFT类/com/tagtraum/jipes/math/FFTFactory.java"rel =" noreferrer>在这里,您可以执行以下操作:

import javax.sound.sampled.*;

public class AudioLED {

    private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;

    public static void main(final String[] args) throws Exception {
        // use only 1 channel, to make this easier
        final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
        final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
        final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
        targetLine.open();
        targetLine.start();
        final AudioInputStream audioStream = new AudioInputStream(targetLine);

        final byte[] buf = new byte[256]; // <--- increase this for higher frequency resolution
        final int numberOfSamples = buf.length / format.getFrameSize();
        final JavaFFT fft = new JavaFFT(numberOfSamples);
        while (true) {
            // in real impl, don't just ignore how many bytes you read
            audioStream.read(buf);
            // the stream represents each sample as two bytes -> decode
            final float[] samples = decode(buf, format);
            final float[][] transformed = fft.transform(samples);
            final float[] realPart = transformed[0];
            final float[] imaginaryPart = transformed[1];
            final double[] magnitudes = toMagnitudes(realPart, imaginaryPart);

            // do something with magnitudes...
        }
    }

    private static float[] decode(final byte[] buf, final AudioFormat format) {
        final float[] fbuf = new float[buf.length / format.getFrameSize()];
        for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
            final int sample = format.isBigEndian()
                    ? byteToIntBigEndian(buf, pos, format.getFrameSize())
                    : byteToIntLittleEndian(buf, pos, format.getFrameSize());
            // normalize to [0,1] (not strictly necessary, but makes things easier)
            fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
        }
        return fbuf;
    }

    private static double[] toMagnitudes(final float[] realPart, final float[] imaginaryPart) {
        final double[] powers = new double[realPart.length / 2];
        for (int i = 0; i < powers.length; i++) {
            powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
        }
        return powers;
    }

    private static int byteToIntLittleEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << 8 * (byteIndex);
        }
        return sample;
    }

    private static int byteToIntBigEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
        }
        return sample;
    }

}

傅立叶变换有什么作用?

用非常简单的术语表示:虽然PCM信号在时域中编码音频,但傅立叶变换后的信号在频域中编码音频.这是什么意思?

在PCM中,每个值编码一个幅度.您可以想象这就像扬声器的膜片以一定幅度来回摆动.每秒在特定时间采样扬声器膜片的位置(采样率).在您的示例中,采样率为44100 Hz,即每秒44100次.这是CD质量音频的典型速率.为了您的目的,您可能不需要这么高的费用.

要从时域转换到频域,请获取一定数量的样本(例如N=1024),然后使用快速傅立叶变换(FFT)对其进行转换.在有关傅立叶变换的入门文章中,您会看到很多有关连续情况的信息,但是您需要注意的是离散情况(也称为 discrete 傅立叶变换,毕达哥拉斯定理,该向量从原点开始的长度仅为m=sqrt(r*r+i*i).

现在我们有了规模.但是它们与频率有何关系?每个幅度值都对应于某个(线性间隔)的频率.首先要了解的是FFT的输出是对称的(在中点镜像).因此,在1024复数中,只有第一个512是我们感兴趣的.那覆盖哪些频率?由于奈奎斯特–香农采样定理,所以用SR=44100 Hz采样的信号不能包含有关大于F=SR/2=22050 Hz的频率的信息(您可能会意识到这是人类听力的上限,这就是为什么将其选择用于CD的原因).因此,对于在44100 Hz处采样的信号的1024采样,您从FFT获得的第一个512复数值覆盖了0 Hz - 22050 Hz频率.每个所谓的频率槽都覆盖2F/N = SR/N = 22050/512 Hz = 43 Hz(槽的带宽).

因此11025 Hz的bin恰好位于索引512/2=256.大小可能在m[256].

要在您的应用程序中使用它,您还需要了解一件事:1024 44100 Hz signal的样本需要很短的时间,即23毫秒.在短时间内,您会看到突然的高峰.最好在阈值之前将多个这些1024样本聚合为一个值.另外,您也可以使用更长的DTFT,例如1024*64,但是,我建议不要将DTFT制作得太长,因为它会造成很大的计算负担.

I want to analyse the current frequency of the microphone input to synchronize my LEDs with the music playing. I know how to capture the sound from the microphone, but I don't know about FFT, which I often saw while searching for a solution to get the frequency.

I want to test if the current volume of a certain frequency is bigger than a set value. The code should be looking something like this:

 if(frequency > value) { 
   LEDs on
 else {
   LEDs off
 }

My problem is how to implement FFT in Java. For better understanding, here is a link to a YouTube video, that shows really good what I'm trying to achieve.

The whole code:

public class Music {

    static AudioFormat format;
    static DataLine.Info info;

    public static void input() {
        format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 2, 4, 44100, false);

        try {
            info = new DataLine.Info(TargetDataLine.class, format);
            final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
            targetLine.open();

            AudioInputStream audioStream = new AudioInputStream(targetLine);

            byte[] buf = new byte[256]

            Thread targetThread = new Thread() {
                public void run() {
                    targetLine.start();
                    try {
                        audioStream.read(buf);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            };

            targetThread.start();
    } catch (LineUnavailableException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

}

Edit: I tried using the JavaFX AudioSpectrumListener of the MediaPlayer, which works really good as long as I use a .mp3 file. The problem is, that I have to use a byte array in which I store the microphone input. I asked another question for this problem here.

解决方案

Using the JavaFFT class from here, you can do something like this:

import javax.sound.sampled.*;

public class AudioLED {

    private static final float NORMALIZATION_FACTOR_2_BYTES = Short.MAX_VALUE + 1.0f;

    public static void main(final String[] args) throws Exception {
        // use only 1 channel, to make this easier
        final AudioFormat format = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 44100, 16, 1, 2, 44100, false);
        final DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);
        final TargetDataLine targetLine = (TargetDataLine) AudioSystem.getLine(info);
        targetLine.open();
        targetLine.start();
        final AudioInputStream audioStream = new AudioInputStream(targetLine);

        final byte[] buf = new byte[256]; // <--- increase this for higher frequency resolution
        final int numberOfSamples = buf.length / format.getFrameSize();
        final JavaFFT fft = new JavaFFT(numberOfSamples);
        while (true) {
            // in real impl, don't just ignore how many bytes you read
            audioStream.read(buf);
            // the stream represents each sample as two bytes -> decode
            final float[] samples = decode(buf, format);
            final float[][] transformed = fft.transform(samples);
            final float[] realPart = transformed[0];
            final float[] imaginaryPart = transformed[1];
            final double[] magnitudes = toMagnitudes(realPart, imaginaryPart);

            // do something with magnitudes...
        }
    }

    private static float[] decode(final byte[] buf, final AudioFormat format) {
        final float[] fbuf = new float[buf.length / format.getFrameSize()];
        for (int pos = 0; pos < buf.length; pos += format.getFrameSize()) {
            final int sample = format.isBigEndian()
                    ? byteToIntBigEndian(buf, pos, format.getFrameSize())
                    : byteToIntLittleEndian(buf, pos, format.getFrameSize());
            // normalize to [0,1] (not strictly necessary, but makes things easier)
            fbuf[pos / format.getFrameSize()] = sample / NORMALIZATION_FACTOR_2_BYTES;
        }
        return fbuf;
    }

    private static double[] toMagnitudes(final float[] realPart, final float[] imaginaryPart) {
        final double[] powers = new double[realPart.length / 2];
        for (int i = 0; i < powers.length; i++) {
            powers[i] = Math.sqrt(realPart[i] * realPart[i] + imaginaryPart[i] * imaginaryPart[i]);
        }
        return powers;
    }

    private static int byteToIntLittleEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << 8 * (byteIndex);
        }
        return sample;
    }

    private static int byteToIntBigEndian(final byte[] buf, final int offset, final int bytesPerSample) {
        int sample = 0;
        for (int byteIndex = 0; byteIndex < bytesPerSample; byteIndex++) {
            final int aByte = buf[offset + byteIndex] & 0xff;
            sample += aByte << (8 * (bytesPerSample - byteIndex - 1));
        }
        return sample;
    }

}

What does the Fourier Transform do?

In very simple terms: While a PCM signal encodes audio in the time domain, a Fourier transformed signal encodes audio in the frequency domain. What does this mean?

In PCM each value encodes an amplitude. You can imagine this like the membrane of a speaker that swing back and forth with certain amplitudes. The position of the speaker membrane is sampled a certain time per second (sampling rate). In your example the sampling rate is 44100 Hz, i.e. 44100 times per second. This is the typical rate for CD quality audio. For your purposes you probably don't need this high a rate.

To transform from the time domain to the frequency domain, you take a certain number of samples (let's say N=1024) and transform them using the fast Fourier transform (FFT). In primers about the Fourier transform you will see a lot of info about the continuous case, but what you need to pay attention to is the discrete case (also called discrete Fourier transform, DTFT), because we are dealing with digital signals, not analog signals.

So what happens when you transform 1024 samples using the DTFT (using its fast implementation FFT)? Typically, the samples are real numbers, not complex numbers. But the output of the DTFT is complex. This is why you usually get two output arrays from one input array. One array for the real part and one for the imaginary part. Together they form one array of complex numbers. This array represents the frequency spectrum of your input samples. The spectrum is complex, because it has to encode two aspects: magnitude (amplitude) and phase. Imagine a sine wave with amplitude 1. As you might remember from math way back, a sine wave crosses through the origin (0, 0), while a cosine wave cuts the y-axis at (0, 1). Apart from this shift both waves are identical in amplitude and shape. This shift is called phase. In your context we don't care about phase, but only about amplitude/magnitude, but the complex numbers you get encode both. To convert one of those complex numbers (r, i) to a simple magnitude value (how loud at a certain frequency), you simply calculate m=sqrt(r*r+i*i). The outcome is always positive. A simple way to understand why and how this works is to imagine a cartesian plane. Treat (r,i) as vector on that plane. Because of the Pythagorean theorem the length of that vector from the origin is just m=sqrt(r*r+i*i).

Now we have magnitudes. But how do they relate to frequencies? Each of the magnitude values corresponds to a certain (linearly spaced) frequency. The first thing to understand is that the output of the FFT is symmetric (mirrored at the midpoint). So of the 1024 complex numbers, only the first 512 are of interest to us. And which frequencies does that cover? Because of the Nyquist–Shannon sampling theorem a signal sampled with SR=44100 Hz cannot contain information about frequencies greater than F=SR/2=22050 Hz (you may realize that this is the upper boundary of human hearing, which is why it was chosen for CDs). So the first 512 complex values you get from the FFT for 1024 samples of a signal sampled at 44100 Hz cover the frequencies 0 Hz - 22050 Hz. Each so-called frequency bin covers 2F/N = SR/N = 22050/512 Hz = 43 Hz (bandwidth of bin).

So the bin for 11025 Hz is right at index 512/2=256. The magnitude may be at m[256].

To put this to work in your application you need to understand one more thing: 1024 samples of a 44100 Hz signal cover a very short amount of time, i.e. 23ms. With that short a time you will see sudden peaks. It's better to aggregate multiple of those 1024 samples into one value before thresholding. Alternatively you could also use a longer DTFT, e.g. 1024*64, however, I advise against making the DTFT very long as it creates a large computational burden.

这篇关于Java:如何获取当前音频输入频率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆