实现实时信号处理在Python - 如何捕捉音频持续? [英] Implement realtime signal processing in Python - how to capture audio continuously?

查看:3664
本文介绍了实现实时信号处理在Python - 如何捕捉音频持续?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我打算在Python中实现一个DSP样信号处理器。它应该通过ALSA捕获音频的小片段,处理它们,然后进行回放通过ALSA。

I'm planning to implement a "DSP-like" signal processor in Python. It should capture small fragments of audio via ALSA, process them, then play them back via ALSA.

要得到的东西开始,我写了下面的(很简单)code。

To get things started, I wrote the following (very simple) code.

import alsaaudio

inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(96000)
inp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
inp.setperiodsize(1920)

outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(96000)
outp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
outp.setperiodsize(1920)

while True:
    l, data = inp.read()
    # TODO: Perform some processing.
    outp.write(data)

的问题是,该音频迟缓,并且是不无间隙。我试图与PCM模式进行实验,将其设置为任何PCM_ASYNC或PCM_NONBLOCK,但问题仍然存在。我认为这个问题是样品有两个后续调用之间到inp.read()都将丢失。

The problem is, that the audio "stutters" and is not gapless. I tried experimenting with the PCM mode, setting it to either PCM_ASYNC or PCM_NONBLOCK, but the problem remains. I think the problem is that samples "between" two subsequent calls to "inp.read()" are lost.

有没有办法在Python捕获音频连续(preferably无需太过具体/非标库)?我想总要得到在后台到一些缓冲,从中我可以读一些瞬时状态捕获的信号,而音频被进一步即使在时间捕捉到缓冲区中,当我执行我的读操作。我怎样才能做到这一点?

Is there a way to capture audio "continuously" in Python (preferably without the need for too "specific"/"non-standard" libraries)? I'd like the signal to always get captured "in the background" into some buffer, from which I can read some "momentary state", while audio is further being captured into the buffer even during the time, when I perform my read operations. How can I achieve this?

即使我使用一个专用的进程/线程捕获音频,这个进程/线程将总是至少有(1)读取源音频,(2),然后把它放到一些缓冲区(从中信号处理进程/线程然后读取)。这两项业务将因此仍处于时间是连续的,因此样本会迷失。我该如何避免这种情况?

Even if I use a dedicated process/thread to capture the audio, this process/thread will always at least have to (1) read audio from the source, (2) then put it into some buffer (from which the "signal processing" process/thread then reads). These two operations will therefore still be sequential in time and thus samples will get lost. How do I avoid this?

非常感谢您的咨询!

编辑2:现在,我已经运行它

import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct

"""
A class implementing buffered audio I/O.
"""
class Audio:

    """
    Initialize the audio buffer.
    """
    def __init__(self):
        #self.__rate = 96000
        self.__rate = 8000
        self.__stride = 4
        self.__pre_post = 4
        self.__read_queue = Queue()
        self.__write_queue = Queue()

    """
    Reads audio from an ALSA audio device into the read queue.
    Supposed to run in its own process.
    """
    def __read(self):
        inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
        inp.setchannels(1)
        inp.setrate(self.__rate)
        inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        inp.setperiodsize(self.__rate / 50)

        while True:
            _, data = inp.read()
            self.__read_queue.put(data)

    """
    Writes audio to an ALSA audio device from the write queue.
    Supposed to run in its own process.
    """
    def __write(self):
        outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
        outp.setchannels(1)
        outp.setrate(self.__rate)
        outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        outp.setperiodsize(self.__rate / 50)

        while True:
            data = self.__write_queue.get()
            outp.write(data)

    """
    Pre-post data into the output buffer to avoid buffer underrun.
    """
    def __pre_post_data(self):
        zeros = np.zeros(self.__rate / 50, dtype = np.uint32)

        for i in range(0, self.__pre_post):
            self.__write_queue.put(zeros)

    """
    Runs the read and write processes.
    """
    def run(self):
        self.__pre_post_data()
        read_process = Process(target = self.__read)
        write_process = Process(target = self.__write)
        read_process.start()
        write_process.start()

    """
    Reads audio samples from the queue captured from the reading thread.
    """
    def read(self):
        return self.__read_queue.get()

    """
    Writes audio samples to the queue to be played by the writing thread.
    """
    def write(self, data):
        self.__write_queue.put(data)

    """
    Pseudonymize the audio samples from a binary string into an array of integers.
    """
    def pseudonymize(self, s):
        return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)

    """
    Depseudonymize the audio samples from an array of integers into a binary string.
    """
    def depseudonymize(self, a):
        s = ""

        for elem in a:
            s += struct.pack(">I", elem)

        return s

    """
    Normalize the audio samples from an array of integers into an array of floats with unity level.
    """
    def normalize(self, data, max_val):
        data = np.array(data)
        bias = int(0.5 * max_val)
        fac = 1.0 / (0.5 * max_val)
        data = fac * (data - bias)
        return data

    """
    Denormalize the data from an array of floats with unity level into an array of integers.
    """
    def denormalize(self, data, max_val):
        bias = int(0.5 * max_val)
        fac = 0.5 * max_val
        data = np.array(data)
        data = (fac * data).astype(np.int64) + bias
        return data

debug = True
audio = Audio()
audio.run()

while True:
    data = audio.read()
    pdata = audio.pseudonymize(data)

    if debug:
        print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))

    ndata = audio.normalize(pdata, 0xffffffff)

    if debug:
        print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
        print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))

    #ndata += 0.01 # When I comment in this line, it wreaks complete havoc!

    if debug:
        print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
        print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))

    pdata = audio.denormalize(ndata, 0xffffffff)

    if debug:
        print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
        print ""

    data = audio.depseudonymize(pdata)
    audio.write(data)

然而,当我甚至进行丝毫的修改音频数据(例如评论在该行),我在输出端得到一个很大的噪音和极端扭曲。好像我没有正确处理PCM数据。奇怪的是,电平表的输出等方面都显得意义。然而,输出完全扭曲(但连续),当我抵消它只是略有下降。

However, when I even perform the slightest modification to the audio data (e. g. comment that line in), I get a lot of noise and extreme distortion at the output. It seems like I don't handle the PCM data correctly. The strange thing is that the output of the "level meter", etc. all appears to make sense. However, the output is completely distorted (but continuous) when I offset it just slightly.

编辑3 :我刚刚发现,当我将其应用到wave文件我的算法(这里不包括)工作。所以,真正的问题出现实际上归结到ALSA API。

EDIT 3: I just found out that my algorithms (not included here) work when I apply them to wave files. So the problem really appears to actually boil down to the ALSA API.

修改4 :我终于找到了问题。他们在下面。

EDIT 4: I finally found the problems. They were the following.

1日 - ALSA悄然回落请求后,向PCM_FORMAT_U32_LE PCM_FORMAT_U8_LE,所以我间$ P $错误地假定每个样品宽4字节PTED的数据。当我要求PCM_FORMAT_S32_LE它的工作原理。

1st - ALSA quietly "fell back" to PCM_FORMAT_U8_LE upon requesting PCM_FORMAT_U32_LE, thus I interpreted the data incorrectly by assuming that each sample was 4 bytes wide. It works when I request PCM_FORMAT_S32_LE.

2日 - alsa的输出似乎预计期大小字节,即使他们明确地指出,这是在规范中的的预期。所以,你必须设定周期四倍的高输出,如果你使用32位采样深度。

2nd - The ALSA output seems to expect period size in bytes, even though they explicitely state that it is expected in frames in the specification. So you have to set the period size four times as high for output if you use 32 bit sample depth.

3日 - 即使在Python(其中有一个全球跨preTER锁),相比于线程的过程是缓慢的。您可以通过更改线程获得延迟了很多,因为I / O线程基本上没有做任何事情,是计算密集型的。

3rd - Even in Python (where there is a "global interpreter lock"), processes are slow compared to Threads. You can get latency down a lot by changing to threads, since the I/O threads basically don't do anything that's computationally intensive.

推荐答案


  1. 读取数据的一大块,

  2. 写入数据的一大块,

  3. 然后等待数据的第二块被读取,

,则输出装置的缓冲器将如果第二块是不大于第一组块较短成为空​​

then the buffer of the output device will become empty if the second chunk is not shorter than the first chunk.

您应该开始实际处理之前填补输出设备的沉默缓冲区。然后以输入或输出处理小的延迟并不重要。

You should fill up the output device's buffer with silence before starting the actual processing. Then small delays in either the input or output processing will not matter.

这篇关于实现实时信号处理在Python - 如何捕捉音频持续?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆