从麦克风生成频谱图 [英] Producing spectrogram from microphone

查看:49
本文介绍了从麦克风生成频谱图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码将从麦克风获取输入,如果音频块的平均值超过某个阈值,它将生成音频块的频谱图(长度为 30 毫秒).以下是正常对话中生成的频谱图的样子:

从我所看到的,这看起来不像我期望的频谱图在给定音频和环境的情况下看起来像什么.我期待更像以下内容(转置以保留空间):

我正在录音的麦克风是我的 Macbook 上的默认麦克风,有什么关于哪里出了问题的建议吗?

<小时>

record.py:

导入pyaudio导入结构导入数学将 numpy 导入为 np从 scipy 导入信号导入 matplotlib.pyplot 作为 plt阈值 = 40 # dB汇率 = 44100INPUT_BLOCK_TIME = 0.03 # 30 毫秒INPUT_FRAMES_PER_BLOCK = int(RATE * INPUT_BLOCK_TIME)def get_rms(block):返回 np.sqrt(np.mean(np.square(block)))类音频处理器(对象):def __init__(self):self.pa = pyaudio.PyAudio()self.stream = self.open_mic_stream()self.threshold = 阈值self.plot_counter = 0定义停止(自我):self.stream.close()def find_input_device(self):device_index = 无对于我在范围内( self.pa.get_device_count() ):devinfo = self.pa.get_device_info_by_index(i)打印('设备 %{}: %{}'.format(i, devinfo['name']))对于 ['mic','input'] 中的关键字:如果 devinfo['name'].lower() 中的关键字:print('找到一个输入:device {} - {}'.format(i, devinfo['name']))设备索引 = 我返回设备索引如果 device_index == 无:print('未找到首选输入;使用默认输入设备.')返回设备索引def open_mic_stream(self):device_index = self.find_input_device()流 = self.pa.open( 格式 = pyaudio.paInt16,频道 = 1,费率 = 费率,输入 = 真,输入设备索引 = 设备索引,frame_per_buffer = INPUT_FRAMES_PER_BLOCK)返回流def processBlock(self, snd_block):f, t, Sxx = signal.spectrogram(snd_block, RATE)plt.pcolormesh(t, f, Sxx)plt.ylabel('频率 [Hz]')plt.xlabel('时间 [秒]')plt.savefig('data/spec{}.png'.format(self.plot_counter), bbox_inches='tight')self.plot_counter += 1定义聆听(自我):尝试:raw_block = self.stream.read(INPUT_FRAMES_PER_BLOCK,exception_on_overflow = False)计数 = len(raw_block)/2格式 = '%dh' %(计数)snd_block = np.array(struct.unpack(format, raw_block))除了作为 e 的例外:print('错误记录:{}'.format(e))返回幅度 = get_rms(snd_block)如果幅度>自我阈值:self.processBlock(snd_block)别的:经过如果 __name__ == '__main__':音频 = AudioHandler()对于范围内的 i (0,100):音频.listen()

<小时>

根据评论进行

如果我们将速率限制为 16000 Hz 并为颜色图使用对数刻度,则这是在麦克风附近敲击的输出:

这对我来说仍然有点奇怪,但似乎是朝着正确方向迈出的一步.

使用 Sox 并与我的程序生成的频谱图进行比较:

解决方案

首先,观察你的代码绘制了多达 100 个频谱图(如果 processBlock 被多次调用),并且你只看到最后一张.你可能想解决这个问题.此外,我假设您知道为什么要使用 30 毫秒的录音.就我个人而言,我想不出一个实际应用,其中笔记本麦克风记录的 30 毫秒可以提供有趣的见解.这取决于您正在录制的内容以及您如何触发录制,但这个问题与实际问题无关.

否则代码运行良好.只需对 processBlock 函数进行一些小的更改,应用一些背景知识,您就可以获得信息丰富且美观的频谱图.

那么让我们谈谈实际的频谱图.我将 SoX 输出作为参考.颜色条注释说它是 dBFS1,这是一个对数度量(dB 是

这改进了色阶.现在我们看到了之前隐藏的更高频段中的噪声.接下来,让我们解决时间分辨率问题.频谱图将信号划分为段(默认长度为 256)并计算每个段的频谱.这意味着我们具有出色的频率分辨率,但时间分辨率却很差,因为只有少数这样的段适合信号窗口(大约 1300 个样本长).时间和频率分辨率之间总是存在权衡.这与

太好了!现在我们在两个轴上都有一个相对平衡的分辨率——但是等等!为什么结果如此像素化?!实际上,这就是短短 30 毫秒时间窗口中的所有信息.1300 个样本可以在二维中分布的方式只有这么多.但是,我们可以稍微作弊并使用更高的 FFT 分辨率和重叠段.这使得结果更平滑,尽管它没有提供额外的信息:

f, t, Sxx = signal.spectrogram(snd_block, RATE, nperseg=64, nfft=256, noverlap=60)

看漂亮的光谱干涉图案.(这些模式取决于所使用的窗口函数,但我们不要在这里陷入细节.请参阅频谱图函数的 window 参数以使用这些模式.)结果看起来不错,但实际上并非如此包含比上一张图片更多的信息.

为了使结果更加 SoX-lixe,请观察 SoX 频谱图在时间轴上相当模糊.您可以通过使用原始的低时间分辨率(长段)来获得此效果,但让它们重叠以获得平滑度:

f, t, Sxx = signal.spectrogram(snd_block, RATE, noverlap=250)

我个人更喜欢第三种解决方案,但您需要找到自己喜欢的时间/频率权衡.

最后,让我们使用更像 SoX 的颜色图:

plt.pcolormesh(t, f, dBS, cmap='inferno')

以下行的简短评论:

阈值 = 40 # dB

阈值与输入信号的均方根值进行比较,输入信号的均方根值不是以 dB 为单位,而是以原始幅度单位为单位.

<小时>

1 显然 FS 是 full scale 的缩写.dBFS 表示 dB 测量是相对于最大范围的.0 dB 是当前表示中可能的最响亮的信号,因此实际值必须 <= 0 dB.

Below I have code that will take input from a microphone, and if the average of the audio block passes a certain threshold it will produce a spectrogram of the audio block (which is 30 ms long). Here is what a generated spectrogram looks like in the middle of normal conversation:

From what I have seen, this doesn't look anything like what I'd expect a spectrogram to look like given the audio and it's environment. I was expecting something more like the following (transposed to preserve space):

The microphone I'm recording with is the default on my Macbook, any suggestions on what's going wrong?


record.py:

import pyaudio
import struct
import math
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt


THRESHOLD = 40 # dB
RATE = 44100
INPUT_BLOCK_TIME = 0.03 # 30 ms
INPUT_FRAMES_PER_BLOCK = int(RATE * INPUT_BLOCK_TIME)

def get_rms(block):
    return np.sqrt(np.mean(np.square(block)))

class AudioHandler(object):
    def __init__(self):
        self.pa = pyaudio.PyAudio()
        self.stream = self.open_mic_stream()
        self.threshold = THRESHOLD
        self.plot_counter = 0

    def stop(self):
        self.stream.close()

    def find_input_device(self):
        device_index = None
        for i in range( self.pa.get_device_count() ):
            devinfo = self.pa.get_device_info_by_index(i)
            print('Device %{}: %{}'.format(i, devinfo['name']))

            for keyword in ['mic','input']:
                if keyword in devinfo['name'].lower():
                    print('Found an input: device {} - {}'.format(i, devinfo['name']))
                    device_index = i
                    return device_index

        if device_index == None:
            print('No preferred input found; using default input device.')

        return device_index

    def open_mic_stream( self ):
        device_index = self.find_input_device()

        stream = self.pa.open(  format = pyaudio.paInt16,
                                channels = 1,
                                rate = RATE,
                                input = True,
                                input_device_index = device_index,
                                frames_per_buffer = INPUT_FRAMES_PER_BLOCK)

        return stream

    def processBlock(self, snd_block):
        f, t, Sxx = signal.spectrogram(snd_block, RATE)
        plt.pcolormesh(t, f, Sxx)
        plt.ylabel('Frequency [Hz]')
        plt.xlabel('Time [sec]')
        plt.savefig('data/spec{}.png'.format(self.plot_counter), bbox_inches='tight')
        self.plot_counter += 1

    def listen(self):
        try:
            raw_block = self.stream.read(INPUT_FRAMES_PER_BLOCK, exception_on_overflow = False)
            count = len(raw_block) / 2
            format = '%dh' % (count)
            snd_block = np.array(struct.unpack(format, raw_block))
        except Exception as e:
            print('Error recording: {}'.format(e))
            return

        amplitude = get_rms(snd_block)
        if amplitude > self.threshold:
            self.processBlock(snd_block)
        else:
            pass

if __name__ == '__main__':
    audio = AudioHandler()
    for i in range(0,100):
        audio.listen()


Edits based on comments:

If we constrain the rate to 16000 Hz and use a logarithmic scale for the colormap, this is an output for tapping near the microphone:

Which still looks slightly odd to me, but also seems like a step in the right direction.

Using Sox and comparing with a spectrogram generated from my program:

解决方案

First, observe that your code plots up to 100 spectrograms (if processBlock is called multiple times) on top of each other and you only see the last one. You may want to fix that. Furthermore, I assume you know why you want to work with 30ms audio recordings. Personally, I can't think of a practical application where 30ms recorded by a laptop microphone could give interesting insights. It hinges on what you are recording and how you trigger the recording, but this issue is tangential to the actual question.

Otherwise the code works perfectly. With just a few small changes in the processBlock function, applying some background knowledge, you can get informative and aesthetic spectrograms.

So let's talk about actual spectrograms. I'll take the SoX output as reference. The colorbar annotation says that it is dBFS1, which is a logarithmic measure (dB is short for Decibel). So, let's first convert the spectrogram to dB:

    f, t, Sxx = signal.spectrogram(snd_block, RATE)   
    dBS = 10 * np.log10(Sxx)  # convert to dB
    plt.pcolormesh(t, f, dBS)

This improved the color scale. Now we see noise in the higher frequency bands that was hidden before. Next, let's tackle time resolution. The spectrogram divides the signal into segments (default length is 256) and computes the spectrum for each. This means we have excellent frequency resolution but very poor time resolution because only a few such segments fit into the signal window (which is about 1300 samples long). There is always a trade-off between time and frequency resolution. This is related to the uncertainty principle. So let's trade some frequency resolution for time resolution by splitting the signal into shorter segments:

f, t, Sxx = signal.spectrogram(snd_block, RATE, nperseg=64)

Great! Now we got a relatively balanced resolution on both axes - but wait! Why is the result so pixelated?! Actually, this is all the information there is in the short 30ms time window. There are only so many ways 1300 samples can be distributed in two dimensions. However, we can cheat a bit and use higher FFT resolution and overlapping segments. This makes the result smoother although it does not provide additional information:

f, t, Sxx = signal.spectrogram(snd_block, RATE, nperseg=64, nfft=256, noverlap=60)

Behold pretty spectral interference patterns. (These patterns depend on the window function used, but let's not get caught in details, here. See the window argument of the spectrogram function to play with these.) The result looks nice, but actually does not contain any more information than the previous image.

To make the result more SoX-lixe observe that the SoX spectrogram is rather smeared on the time axis. You get this effect by using the original low time resolution (long segments) but let them overlap for smoothness:

f, t, Sxx = signal.spectrogram(snd_block, RATE, noverlap=250)

I personally prefer the 3rd solution, but you will need to find your own preferred time/frequency trade-off.

Finally, let's use a colormap that is more like SoX's:

plt.pcolormesh(t, f, dBS, cmap='inferno')

A short comment on the following line:

THRESHOLD = 40 # dB

The threshold is compared against the RMS of the input signal, which is not measured in dB but raw amplitude units.


1 Apparently FS is short for full scale. dBFS means that the dB measure is relative to the maximum range. 0 dB is the loudest signal possible in the current representation, so actual values must be <= 0 dB.

这篇关于从麦克风生成频谱图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆