使用 linux 或 python 查找 mp3 声音样本的时间戳 [英] find the timestamp of a sound sample of an mp3 with linux or python

查看:128
本文介绍了使用 linux 或 python 查找 mp3 声音样本的时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在慢慢研究一个项目,如果计算机可以在 mp3 文件中找到哪里出现某个样本,这将非常有用.我会将这个问题限制在一个相当精确的音频片段上,而不仅仅是例如同一乐队不同录音中的歌曲中的合唱,它会成为某种机器学习问题.我在想,如果它没有添加噪音并且来自同一个文件,那么应该可以在没有机器学习的情况下以某种方式定位它发生的时间,就像 grep 可以在文本文件中找到出现单词的行一样.

I am slowly working on a project which where it would be very useful if the computer could find where in an mp3 file a certain sample occurs. I would restrict this problem to meaning a fairly exact snippet of the audio, not just for example the chorus in a song on a different recording by the same band where it would become more some kind of machine learning problem. Am thinking if it has no noise added and comes from the same file, it should somehow be possible to locate the time at which it occurs without machine learning, just like grep can find the lines in a textfile where a word occurs.

如果你身边没有 mp3,可以用一些公共领域的网络上可用的音乐来设置问题,所以没有人抱怨:

In case you don't have an mp3 lying around, can set up the problem with some music available on the net which is in the public domain, so nobody complains:

curl https://web.archive.org/web/20041019004300/http://www.navyband.navy.mil/anthems/ANTHEMS/United%20Kingdom.mp3 --output godsavethequeen.mp3

一分钟:

exiftool godsavethequeen.mp3 | grep Duration
Duration                        : 0:01:03 (approx)

现在在 30 到 33 秒之间剪掉一点(那一点点啦啦啦..):

Now cut out a bit between 30 and 33 seconds (the bit which goes la la la la..):

ffmpeg -ss 30 -to 33 -i godsavethequeen.mp3 gstq_sample.mp3

文件夹中的两个文件:

$ ls -la
-rw-r--r-- 1 cardamom cardamom   48736 Jun 23 00:08 gstq_sample.mp3
-rw-r--r-- 1 cardamom cardamom 1007055 Jun 22 23:57 godsavethequeen.mp3

出于某种原因,exiftool 似乎高估了样本的持续时间:

For some reason exiftool seems to overestimate the duration of the sample:

$ exiftool gstq_sample.mp3 | grep Duration
Duration                        : 6.09 s (approx)

..但我想这只是它告诉你的近似值.

..but I suppose it's only approximate like it tells you.

这就是我的追求:

$ findsoundsample gstq_sample.mp3 godsavethequeen.mp3
start 30 end 33

如果它是 bash 脚本或 python 解决方案,我很高兴,即使使用某种 python 库.有时,如果您使用错误的工具,解决方案可能会奏效,但看起来很糟糕,因此选择哪个工具更合适.这是一分钟的 mp3,还没有考虑过要完成它的性能,但想要一些可扩展性,例如在半小时内找到 10 秒.

Am happy if it is a bash script or a python solution, even using some kind of python library. Sometimes if you use the wrong tool, the solution might work but look horrible, so whichever tool is more suitable. This is a one minute mp3, have not thought yet about performance just about getting it done at all, but would like some scalability, eg find ten seconds somewhere in half an hour.

在我尝试自己解决此问题时一直在查看以下资源:

Have been looking at the following resources as I try to solve this myself:

如何使用 Python 和Gracenote?

https://github.com/craigfrancis/audio-detect

https://madmom.readthedocs.io/en/latest/introduction.html

在 Python 中读取 *.wav 文件

https://github.com/aubio/aubio

aubionset 是一个不错的选择

aubionset is a good candidate

https://willdrevo.com/fingerprinting-and-audio-识别与蟒蛇/

推荐答案

Carson 中所建议="https://stackoverflow.com/a/62579419/2994596">答案,一旦将文件转换为 .wav 格式,处理音频就会变得容易得多.

As suggested in Carson's answer, processing the audio gets a lot easier once the files are converted to the .wav format.

您可以使用 Wernight关于在 python 中阅读 mp3 的答案:

ffmpeg -i godsavethequeen.mp3 -vn -acodec pcm_s16le -ac 1 -ar 44100 -f wav godsavethequeen.wav
ffmpeg -i gstq_sample.mp3 -vn -acodec pcm_s16le -ac 1 -ar 44100 -f wav gstq_sample.wav

然后找到样本的位置主要是获得交叉的峰值源(在本例中为 godsavethequeen.wav)和要查找的样本(gstq_sample.wav)之间的 -correlation 函数.本质上,这将找到样本看起来最像源中相应部分的偏移.这可以通过 python 使用 scipy 来完成.信号相关.

Then to find the position of the sample is mostly a matter of obtaining the peak of the cross-correlation function between the source (godsavethequeen.wav in this case) and the sample to look for (gstq_sample.wav). In essence, this will find the shift at which the sample looks the most like the corresponding portion in the source. This can be done with python using scipy.signal.correlate.

抛出一个小的 python 脚本来做到这一点:

Throwing a small python script to do just that would look like:

import numpy as np
import sys
from scipy.io import wavfile
from scipy import signal

snippet = sys.argv[1]
source  = sys.argv[2]

# read the sample to look for
rate_snippet, snippet = wavfile.read(snippet);
snippet = np.array(snippet, dtype='float')

# read the source
rate, source = wavfile.read(source);
source = np.array(source, dtype='float')

# resample such that both signals are at the same sampling rate (if required)
if rate != rate_snippet:
  num = int(np.round(rate*len(snippet)/rate_snippet))
  snippet = signal.resample(snippet, num)

# compute the cross-correlation
z = signal.correlate(source, snippet);

peak = np.argmax(np.abs(z))
start = (peak-len(snippet)+1)/rate
end   = peak/rate

print("start {} end {}".format(start, end))

请注意,为了更好的衡量,我已经包含了一个检查以确保两个 .wav 文件具有相同的采样率(并根据需要重新采样),但您也可以确保它们在从 .wav 转换时始终相同.使用 -ar 44100 参数到 ffmpeg 的 mp3 格式.

Note that for good measures I've included a check to make sure both .wav files have the same sampling rate (and resample as needed), but you could alternatively make sure they are always the same while you convert them from .mp3 format using the -ar 44100 argument to ffmpeg.

这篇关于使用 linux 或 python 查找 mp3 声音样本的时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆