以编程方式“聆听"声音(信号处理?) [英] Programmatically 'Listening' to Sound (Signal Processing?)

查看:35
本文介绍了以编程方式“聆听"声音(信号处理?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我熟悉计算机视觉(好吧,知道它),其中一种应用可以是图像识别,例如光学字符识别,我相信.然而,我更感兴趣的是计算机听力",我刚刚了解到它被认为是 数字信号正在处理.

I'm familiar with Computer Vision (Well, know OF it), of which one application can be image recognition, such as Optical Character Recognition, I believe. However, something that I am more interested in is 'computer listening', which I have just learned is considered Digital Signal Processing.

关于信号处理,我最感兴趣的是在音乐中的潜在应用.我记得不久前我看到了一个应用程序的预览(抱歉,忘记名字),它可以收听某人弹吉他的录音,并自动将其绘制在时间线上实际演奏的音符/和弦.使用该程序,用户可以移动这些内容甚至编辑它们.现在,显然这要复杂得多,但它是否涉及相同的事情?信号处理?我还对音乐可视化器和智能照明系统中的可能应用感兴趣.

The thing that interests me the most about signal processing is the potential application in music. I remember a while ago I saw a preview of an application (Sorry, forgot the name) which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played. Using the program, the user was able to move these around and even edit them. Now, obviously this is a lot more complicated, but does it involve the same thing? Signal Processing? I am also interested in possible applications in music visualizers and intelligent lighting systems.

我的理解是,对 MP3 等压缩音频格式进行处理不会产生与包含单独轨道的 MIDI 相同的结果(也许我误解了).像 PCM 这样的未压缩格式会比 MP3 更好吗?我对声音处理一无所知,这只是我从目前阅读的内容中推断出来的.

My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood). Would an uncompressed format such as PCM do better than MP3? I don't know anything about sound processing, that's just what I'm inferring from what I've read so far.

我已经看过这个问题,里面有很好的答案和链接这涵盖了我的很多问题.但是,我发现的大多数链接都是理论性的,我确信这些链接都很有趣,鉴于我对这个主题的兴趣,绝对值得一读,但我想知道是否有任何现有的库可以促进这一点,或与此主题相关的面向计算机科学/编程的文章,可能带有示例代码.即使是开源声音/音乐可视化工具或任何其他开源声音处理代码也会很棒.

I have already seen this question which has great answers and links that cover a lot of my questions. However, most of the links I've found are theoretical, which I'm sure is all interesting and is definitely worth a read given my interest in the subject, but I wanted to know if there are any existing libraries which can facilitate this, or articles pertaining to this subject that geared towards Computer Science/Programming, with perhaps example code. Even open source sound/music visualizers or any other open source sound processing code would be great.

对不起,如果我没有任何意义.就像我说的,我不知道我在说什么.

Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.

推荐答案

我最感兴趣的事情关于信号处理是在音乐中的潜在应用.一世记得前段时间看过预告一个应用程序(对不起,忘记了姓名)

The thing that interests me the most about signal processing is the potential application in music. I remember a while ago I saw a preview of an application (Sorry, forgot the name)

也许 cubase ?

可以听录音有人弹吉他,和自动将其绘制在一个与实际音符/和弦的时间线玩过的

which could listen to a recording of someone playing a guitar, and automatically graph it out across a time-line with the actual notes/chords that were played

深度简化,当你弹奏一个音符时,你会产生一个给定频率的周期波.有一个数学技巧(傅立叶变换 DFT)可以将波转换为频谱,它不是根据时间显示强度,而是根据波的频率显示强度.例如,音叉的完美 A 音符会产生 440 Hz 的振荡波.在时域中,这将显示为正弦波.在频域中,它将表现为以 440 Hz 为中心的单个窄尖峰.

Deeply simplified, when you play a note you produce a periodic wave with a given frequency. There's a mathematical trick (the Fourier transform DFT) that converts the wave into the spectrum, which instead of presenting intensity against time, it shows it against frequency of the wave. For example, a perfect A note from a tuning fork would produce an oscillating wave at 440 Hz. In the time domain this would appear as a sinusoidal wave. In the frequency domain, it will appear as a single, narrow spike centered at 440 Hz.

现在,当您弹奏吉他时,您不会产生完美的正弦波.敲击 A 将产生 440 Hz 的基本频率,但也会产生很多额外的频率(例如 880,在更高的八度音程上,但还有很多其他更高和更低的频率),这是由于振动弦的物理特性、材料和吉他的形状等.这些额外的频率称为泛音,它们与基音混合以产生吉他的声音"(在音乐行话中称为音色).不同的乐器(例如钢琴)将具有不同的泛音与基音混合,从而产生不同的音色.

Now, when you play a guitar you don't produce perfect sinusoidal waves. Hitting an A will produce the fundamental frequency, 440 Hz, but also a lot of additional frequencies (e.g. 880, on octave higher, but also a lot of other higher and lower freqs), due to the physics of the vibrating string, the material and shape of the guitar etc.. These additional frequencies are called harmonics, and they mix with the fundamental to produce "the sound of the guitar" (what in musical jargon is called timbre). A different instrument (say piano) will have different mixing of harmonics with the fundamental, producing a different timbre.

DSP 程序所做的是对输入信号执行 DFT.通过额外的技巧,他们可以找到基波和谐波,并根据他们的发现推断出您弹奏的音符.这必须很快发生,因为您可以在现场演奏和触发特殊技巧时找到音符.例如,您可以在吉他上敲击 A 音符,DSP 识别出它是 A 并将其替换为钢琴中的 A,因此您可以从扬声器中获得钢琴的声音.

What DSP programs do is to perform a DFT on the entering signal. With additional tricks, they find the fundamental and the harmonics, and according to what they find they infer the note you played. This must happen fast, because you could find the note while playing live and triggering special tricks. For example, you could hit an A note on the guitar, the DSP understands it's an A and replaces it with the A from a piano, so from the speakers you obtain the sound of a piano.

使用该程序,用户能够移动这些甚至编辑他们.现在,显然这是很多更复杂,但是否涉及同样的事情?信号处理?一世我也对可能感兴趣在音乐可视化和智能照明系统.

Using the program, the user was able to move these around and even edit them. Now, obviously this is a lot more complicated, but does it involve the same thing? Signal Processing? I am also interested in possible applications in music visualizers and intelligent lighting systems.

是的.一旦进入频域,事情就会变得非常容易.例如,您可以根据声音的频率点亮一个特定的灯,然后用低音鼓点亮另一个灯.

Yes. Once you are in the frequency domain, things gets very easy. For example, you could light up a specific light according to the voice frequencies, and another light with the bass drum.

我的理解是这样做处理压缩音频MP3 之类的格式不会产生相同的结果结果为 MIDI,其中包含单独的曲目(也许我误会了).

My understanding is that doing this processing on a compressed audio format such as MP3 wont yield the same results as MIDI which contains separate tracks (Maybe I misunderstood).

它们是两种不同的东西.MP3 是一种来自声波的压缩格式.基本上它需要引导扬声器并对其进行压缩.想法是一样的:DFT,然后去除不太可能被听到的东西(例如,在高强度声音之后出现的高音不太可能被听到,因此被去除).

They are two different things. MP3 is a compressed format from a sound wave. Basically it takes what pilots the speakers, and compresses it. The idea is the same: DFT, then removal of stuff that is unlikely to be heard (for example, a high pitch that comes right after a high intensity sound is less likely to be heard, so it gets removed).

另一方面,MIDI 是一个事件卷轴(你知道,就像遥远西部的那些钢琴,带有卷纸卷轴).该文件不包含音乐.它包含让 MIDI 播放器在特定时间使用特定乐器演奏特定音符的指示.乐器库"的质量(除其他外)是将坏的 MIDI 播放器(听起来像一个儿童玩具)与好的 MIDI 播放器(听起来很逼真,尤其是钢琴和小提琴,管乐器我仍然必须听到一个现实的).

MIDI on the other hand is a scroll of events (you know, like those pianos in the far west, with the rolling paper scroll). The file contains no music. It contains instead directions for a MIDI player to perform specific notes at specific times with specific instruments. The quality of the "instrument bank" is (among other things) what distinguish a bad MIDI player (which sounds like a child toy) from a good MIDI player (which sounds realistic, in particular for pianos and violins, for wind instruments I still have to hear a realistic one).

从 MIDI 到 MP3,您只需通过 MIDI 播放器演奏即可.反过来做则完全不同,而且要复杂得多,正如您所说,这就是 DSP 发挥作用的地方.

It takes that going from MIDI to MP3, you just perform through a MIDI player. To do the other way around is a different story altogether, and much more complex, and here is where DSP comes into play, as you said.

这就像煮鱼缸一样.你得到一个鱼汤.但要从鱼汤回到鱼缸,就难多了.

It's like boiling a fisk tank. You get a fish soup. But to get from the fish soup back to the fish tank, it's much harder.

未压缩的像 PCM 这样的格式比 MP3 更好吗?

Would an uncompressed format such as PCM do better than MP3?

PCM 是一种将模拟信号转换为数字信号的技术.所以你的问题有一个根本的误解,即不存在 PCM 格式(RAW 格式是一个接近的电话,基本上只包含原始数据).如果您问未压缩的 WAV(包含 PCM 数据)是否比 MP3 更好,那么是的,但有时问题是这对人耳真正重要到什么程度,以及您必须对该数据执行多少后处理.

PCM is a technique to convert an analog signal to a digital signal. So your question has a fundamental misunderstanding, that no PCM format exists (the RAW format is a close call, contaning basically nothing but crude data). If you ask if a uncompressed WAV (which contains PCM data) is better than MP3, then yes, but the question sometimes is how much this better really matters to the human ear, and how much postprocessing you have to perform on that data.

知道是否有任何现有的可以促进这一点的图书馆,或与此主题相关的文章面向计算机的科学/编程,也许示例代码.甚至开源声音/音乐可视化器或任何其他开源声音处理代码会很棒.

know if there are any existing libraries which can facilitate this, or articles pertaining to this subject that geared towards Computer Science/Programming, with perhaps example code. Even open source sound/music visualizers or any other open source sound processing code would be great.

如果你喜欢 python,看看这个页面

If you like python, take a look at this page

对不起,如果我没有任何意义.就像我说的,我不知道我在说什么.

Sorry if I didn't make any sense. Like I said, I don't know what I'm talking about.

我也没有,但我玩弄了一下.

Neither do I, but I toyed a bit with it.

这篇关于以编程方式“聆听"声音(信号处理?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆