获取最大振幅为每秒的音频文件 [英] Getting max amplitude for an audio file per second

查看:248
本文介绍了获取最大振幅为每秒的音频文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这里也有一些类似的问题,但其中大部分是关于发生波形图片,这不是我想要的。

I know there are some similar questions here, but most of them are concerning generating waveform images, which is not what I want.

我的目标是产生一个音频文件,类似的SoundCloud波形的可视化,但不是一个图像。我想有数组中的每一秒(或半秒),音频剪辑的最大振幅数据。然后,我可以用这个数据来创建基于CSS的可视化。

My goal is to generate a waveform visualization for an audio file, similar to SoundCloud, but not an image. I'd like to have the max amplitude data for each second (or half second) of an audio clip in an array. I could then use this data to create a CSS-based visualization.

理想情况下,我想获得具有所有的每个第二与整个音频文件的最大振幅的百分比的幅度值的数组。这里有一个例子:

Ideally I'd like to get an array that has all the amplitude values for each second as a percentage of the maximum amplitude of the entire audio file. Here's an example:

[
    0.0,  # Relative max amplitude of first second of audio clip (0%)
    0.04,  # Relative max amplitude of second second of audio clip (4%)
    0.15,  # Relative max amplitude of third second of audio clip (15%)
    # Some more
    1.0,  # The highest amplitude of the whole audio clip will be 1.0 (100%)
]

我想我将不得不使用至少 numpy的和Python的模块,但我不知道如何得到我想要的数据。我想使用Python,但我不完全针对使用某种命令行工具。

I assume I'll have to use at least numpy and Python's wave module, but I'm not sure how to get the data I want. I'd like to use Python but I'm not completely against using some kind of command-line tool.

推荐答案

如果您允许的GStreamer,这里是一个小脚本,可以做的伎俩。它接受的GStreamer可以处理任何音频文件。

If you allow gstreamer, here is a little script that could do the trick. It accept any audio file that gstreamer can handle.


  • 构造一个gstreamer的管道中,使用audioconvert减少信道为1,并使用水平模块获取峰

  • 运行,直到管道EOS命中

  • 从发现的最小/最大规范化的峰值。

段:

import os, sys, pygst
pygst.require('0.10')
import gst, gobject
gobject.threads_init()

def get_peaks(filename):
    global do_run

    pipeline_txt = (
        'filesrc location="%s" ! decodebin ! audioconvert ! '
        'audio/x-raw-int,channels=1,rate=44100,endianness=1234,'
        'width=32,depth=32,signed=(bool)True !'
        'level name=level interval=1000000000 !'
        'fakesink' % filename)
    pipeline = gst.parse_launch(pipeline_txt)

    level = pipeline.get_by_name('level')
    bus = pipeline.get_bus()
    bus.add_signal_watch()

    peaks = []
    do_run = True

    def show_peak(bus, message):
        global do_run
        if message.type == gst.MESSAGE_EOS:
            pipeline.set_state(gst.STATE_NULL)
            do_run = False
            return
        # filter only on level messages
        if message.src is not level or \
           not message.structure.has_key('peak'):
            return
        peaks.append(message.structure['peak'][0])

    # connect the callback
    bus.connect('message', show_peak)

    # run the pipeline until we got eos
    pipeline.set_state(gst.STATE_PLAYING)
    ctx = gobject.gobject.main_context_default()
    while ctx and do_run:
        ctx.iteration()

    return peaks

def normalize(peaks):
    _min = min(peaks)
    _max = max(peaks)
    d = _max - _min
    return [(x - _min) / d for x in peaks]

if __name__ == '__main__':
    filename = os.path.realpath(sys.argv[1])
    peaks = get_peaks(filename)

    print 'Sample is %d seconds' % len(peaks)
    print 'Minimum is', min(peaks)
    print 'Maximum is', max(peaks)

    peaks = normalize(peaks)
    print peaks

和一个输出的例子:

$ python gstreamerpeak.py 01\ Tron\ Legacy\ Track\ 1.mp3 
Sample is 182 seconds
Minimum is -349.999999922
Maximum is -2.10678956719
[0.0, 0.0, 0.9274581631597019, 0.9528318436488018, 0.9492396611762614,
0.9523404330322813, 0.9471685835966183, 0.9537281219301242, 0.9473486577135167,
0.9479292126411365, 0.9538221105563514, 0.9483845795252251, 0.9536790832823281,
0.9477264933378022, 0.9480077366961968, ...

这篇关于获取最大振幅为每秒的音频文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆