什么是使用pyaudio时的块,样本和帧 [英] What are chunks, samples and frames when using pyaudio

查看:513
本文介绍了什么是使用pyaudio时的块,样本和帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

浏览pyaudio的文档并阅读网络上的其他文章后,如果我的理解是正确的,我会感到困惑.

After going through the documentation of pyaudio and reading some other articles on the web, I am confused if my understanding is correct.

这是在pyaudio网站上找到的音频录制代码:

This is the code for audio recording found on pyaudio's site:

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("* done recording")

stream.stop_stream()
stream.close()
p.terminate()

如果我添加这些行,那么我就能播放我录制的内容:

and if I add these lines then I am able to play whatever I recorded:

play=pyaudio.PyAudio()
stream_play=play.open(format=FORMAT,
                      channels=CHANNELS,
                      rate=RATE,
                      output=True)
for data in frames: 
    stream_play.write(data)
stream_play.stop_stream()
stream_play.close()
play.terminate()

  1. "RATE"是每秒收集的样本数.
  2. "CHUNK"是缓冲区中的帧数.
  3. 每个帧将有2个样本作为"CHANNELS = 2".
  4. 每个样本的大小为2个字节,使用函数pyaudio.get_sample_size(pyaudio.paInt16)计算.
  5. 因此,每帧的大小为4个字节.
  6. 在帧"列表中,每个元素的大小必须为1024 * 4字节,例如,frames[0]的大小必须为4096字节.然而, sys.getsizeof(frames[0])返回4133,但是len(frames[0])返回4096.
  7. for循环执行int(RATE / CHUNK * RECORD_SECONDS)次,我不明白为什么. 此处是鲁本·桑切斯(Ruben Sanchez)"回答了相同的问题,但我不能确定它是否如他所说的CHUNK=bytes正确.并且根据他的解释,该值必须为int(RATE / (CHUNK*2) * RECORD_SECONDS),因为(CHUNK*2)是每次迭代在缓冲区中读取的样本数.
  8. 最后,当我编写print frames[0]时,它会打印乱码,因为它试图将字符串视为不是ASCII编码的字符串,而只是字节流.那么,如何使用struct模块以十六进制打印此字节流?如果以后再用自己选择的值更改每个十六进制值,它还会产生可播放的声音吗?
  1. "RATE" is the number of samples collected per second.
  2. "CHUNK" is the number of frames in the buffer.
  3. Each frame will have 2 samples as "CHANNELS=2".
  4. Size of each sample is 2 bytes, calculated using the function: pyaudio.get_sample_size(pyaudio.paInt16).
  5. Therefore size of each frame is 4 bytes.
  6. In the "frames" list, size of each element must be 1024*4 bytes, for example, size of frames[0] must be 4096 bytes. However, sys.getsizeof(frames[0]) returns 4133, but len(frames[0]) returns 4096.
  7. for loop executes int(RATE / CHUNK * RECORD_SECONDS) times, I cant understand why. Here is the same question answered by "Ruben Sanchez" but I cant be sure if its correct as he says CHUNK=bytes. And according to his explanation, it must be int(RATE / (CHUNK*2) * RECORD_SECONDS) as (CHUNK*2) is the number of samples read in buffer with each iteration.
  8. Finally when I write print frames[0], it prints gibberish as it tries to treat the string to be ASCII encoded which it is not, it is just a stream of bytes. So how do I print this stream of bytes in hexadecimal using struct module? And if later, I change each of the hexadecimal value with values of my choice, will it still produce a playable sound?

我上面写的都是我对事物的理解,其中许多可能是错误的.

Whatever I wrote above was my understanding of the things and many of them maybe wrong.

推荐答案

  1. "RATE"是采样率",即每秒的数量
  2. "CHUNK"是(任意选择的)个帧的数量(在此示例中,信号(可能很长)被分割成)
  3. 是的,每个帧都有2个样本,例如"CHANNELS = 2",但是在这种情况下很少使用样本"一词(因为这很令人困惑)
  4. 是的,在此示例中,每个样本的大小为2个字节(= 16位)
  5. 是的,每个帧的大小是4个字节
  6. 是的,帧"的每个元素应为4096字节. sys.getsizeof()报告Python解释器所需的存储空间,通常比原始数据的实际大小大一点.
  7. RATE * RECORD_SECONDS是应记录的的数量.由于不会对每个重复for循环,而仅对每个 chunk 重复,因此循环数必须除以块大小CHUNK.这与样本没有关系,因此没有涉及2的因素.
  8. 如果您真的想查看十六进制值,可以尝试使用[hex(x) for x in frames[0]]之类的方法.如果要获取实际的2字节数字,请在struct模块中使用格式字符串'<H'.
  1. "RATE" is the "sampling rate", i.e. the number of frames per second
  2. "CHUNK" is the (arbitrarily chosen) number of frames the (potentially very long) signals are split into in this example
  3. Yes, each frame will have 2 samples as "CHANNELS=2", but the term "samples" is seldom used in this context (because it is confusing)
  4. Yes, size of each sample is 2 bytes (= 16 bits) in this example
  5. Yes, size of each frame is 4 bytes
  6. Yes, each element of "frames" should be 4096 bytes. sys.getsizeof() reports the storage space needed by the Python interpreter, which is typically a bit more than the actual size of the raw data.
  7. RATE * RECORD_SECONDS is the number of frames that should be recorded. Since the for loop is not repeated for each frame but only for each chunk, the number of loops has to be divided by the chunk size CHUNK. This has nothing to do with samples, so there is no factor of 2 involved.
  8. If you really want to see the hexadecimal values, you can try something like [hex(x) for x in frames[0]]. If you want to get the actual 2-byte numbers use the format string '<H' with the struct module.

您可能对我的有关使用wave模块读取WAV文件的教程感兴趣,该模块更详细地介绍了您的一些问题:

You might be interested in my tutorial about reading WAV files with the wave module, which covers some of your questions in more detail: http://nbviewer.jupyter.org/github/mgeier/python-audio/blob/master/audio-files/audio-files-with-wave.ipynb

这篇关于什么是使用pyaudio时的块,样本和帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆