如何在OpenSMILE中创建自定义配置文件 [英] How to create custom config files in OpenSMILE

查看:248
本文介绍了如何在OpenSMILE中创建自定义配置文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用OpenSMILE从音频样本中提取一些功能,但是我意识到设置配置文件有多困难.

I am trying to extract some features from an audio sample using OpenSMILE, but I'm realizing how difficult it is to set up a config file.

该文档不是很有帮助.我能做的最好的事情是运行提供的一些示例配置文件,看看结果如何,然后进入配置文件并尝试确定功能的指定位置.这是我所做的:

The documentation is not very helpful. The best I could do was run some of the sample config files that are provided, see what came out, and then go into the config file and try to determine where the feature was specified. Here's what I did:

我使用了INTERSPEECH 2010 Paralinguistic Challenge(IS10_paraling.conf)中使用的默认功能集.

I used the default feature set used from The INTERSPEECH 2010 Paralinguistic Challenge (IS10_paraling.conf).

我在一个示例音频文件上运行了它.

I ran it over a sample audiofile.

我看着结果.然后,我深入阅读了配置文件,试图找出功能的指定位置.

I looked at what came out. Then I read the config file in depth, trying to find out where the feature was specified.

下面是一个小减价表,显示了我的探索结果:

Here's a little markdown table showing the results of my exploration:

| Feature generated | instruction in the conf file                            |
|-------------------|---------------------------------------------------------|
| pcm_loudness      | I see: 'loudness=1'                                     |
| mfcc              | I see a section: [mfcc:cMfcc]                           |
| lspFreq           | no matches for the text 'lspFreq' anywhere              |
| F0finEnv          | I seeF0finalEnv = 1 under [pitchSmooth:cPitchSmoother]  |

我看到的是4种不同的功能,全部由配置文件中的不同指令生成.好吧,对于其中一个,我在配置文件中找不到可辨别的指令.没有模式,没有直观的语法或没有明显的系统,我不知道如何最终弄清楚如何指定自己想要生成的特征.

What I see, is 4 different features, all generated by a different instruction in the config file. Well, for one of them, there was no disconcernable instruction in the config file that I could find. With no pattern or intuitive syntax or apparent system, I have no idea how I can eventually figure out how to specify my own features I want to generate.

没有教程,没有YouTube视频,没有StackOverflow问题,也没有博客文章谈论如何做到这一点.确实令人惊讶,因为这显然是使用OpenSMILE的很大一部分.

There are no tutorials, no YouTube videos, no StackOverflow question and no blog posts out there talking about how this could be done. Which is really surprising since this is obviously a huge part of using OpenSMILE.

如果有人找到了,请您给我建议如何创建OpenSMILE的自定义配置文件?谢谢!

If anyone finds this, please, can you advise me on how to create custom config files of OpenSMILE? Thanks!

推荐答案

感谢您对openSMILE的关注以及您渴望构建自己的配置文件.

thanks for your interest in openSMILE and your eagerness to build your own configuration files.

科学界中的大多数用户实际上将openSMILE用于其基线功能集的预定义配置文件,该功能在2.3版中更加灵活地使用(更多命令行选项可输出到不同的文件格式,等等).

Most users in the scientific community actually use openSMILE for its pre-defined config files for the baseline feature sets, which in version 2.3 are even more flexible to use (more commandline options to output to different file formats etc.).

我承认所提供的文档并不尽如人意.但是,openSMILE是一款非常复杂的软件,具有很多功能,目前只有最重要的部分得到了很好的文档记录.

I admit that the documentation provided is not as good as it could be. However, openSMILE is a very complex piece of Software with a lot of functionality, of which only the most important parts are currently well documented.

最好的起点是阅读openSMILE书籍和SIG'MM教程,这些参考书均在 http://opensmile中引用.audeering.com/.它包含有关如何编写配置文件的部分.下一个重要元素是二进制文件的在线帮助:

The best starting point would be to read the openSMILE book and the SIG'MM tutorials all referenced at http://opensmile.audeering.com/ . It contains a section on how to write configuration files. The next important element is the online help of the binary:

  • SMILExtract -L 列出可用的组件
  • SMILExtract -H cComponentName 列出了给定组件支持的所有选项(因此也可以提取其功能),并对每个选项进行了简短描述
  • SMILExtract -configDflt cComponentName 为您提供了组件的模板配置部分,其中列出了所有选项并设置了默认值
  • SMILExtract -L lists the available components
  • SMILExtract -H cComponentName lists all options which a given component supports (and thus also features it can extract) with a short description for each
  • SMILExtract -configDflt cComponentName gives you a template configuration section for the component with all options listed and defaults set

由于openSMILE的体系结构以所有音频功能的增量处理为中心,因此(至少目前还没有)没有简单的语法来定义所需的功能.相反,您可以通过添加组件来定义处理链:

Due to the architecture of openSMILE, which is centered on incremental processing of all audio features, there is (at least not yet) no easy syntax to define the features you want. Rather, you define the processing chain by adding components:

  • 数据源将读入数据(例如,从音频文件,csv文件或麦克风),
  • 数据处理器将以单独的步骤进行信号处理和特征提取(例如,用于提取MFCC的窗口,窗口函数,FFT,幅度,mel谱,倒频谱系数(MFCC));每一步都有一个数据处理器.
  • 数据接收器将数据写入输出文件或将结果发送到服务器等.
  • data sources will read in data (from audio files, csv files, or microphone, for example),
  • data processors will do signal processing and feature extraction in individual steps (windowing, window function, FFT, magnitudes, mel-spectrum, cepstral coefficients (MFCC), for example for extracting MFCC); for each step there is a data processor.
  • data sinks will write data to output files or send results to a server etc.

您可以通过"reader.dmLevel"和"writer.dmLevel"选项连接组件.这些定义了组件用来交换数据的数据存储级别的名称.只有一个组件可以写入一个级别,即writer.dmLevel = levelName定义了该级别,并且只能出现一次.通过设置reader.dmLevel = levelName,可以从该级别读取多个组件.

You connect the components via the "reader.dmLevel" and "writer.dmLevel" options. These define a name of a data memory level that the components use to exchange data. Only one component may write to one level, i.e. writer.dmLevel=levelName defines the level and may appear only once. Multiple components can read from this level by setting reader.dmLevel=levelName.

然后在每个组件中设置选项以启用功能计算并为此设置参数.要回答有关lspFreq的问题:默认情况下,cLsp组件中可能启用了此功能,因此您看不到它的显式选项.对于openSMILE的未来版本,将并且应该更严格地遵循显式设置所有选项的做法.

In each component you then set the options to enable computation of features and set parameters for this. To answer your question about lspFreq: This is probably enabled by default in the cLsp component, so you don't see an explicit option for it. For future versions of openSMILE the practice of setting all options explicitly will and should be followed more tightly.

输出中的要素名称将由组件自动定义.通常,每个组件都会在名称中添加一个部分,因此您可以从名称中推断出完整的处理链.尽管某些组件可能会在内部覆盖它们或稍微改变其行为,但是nameAppend和copyInputName(适用于大多数数据处理器)选项可控制此行为.

The names of the features in the output will be automatically defined by the components. Often each component adds a part the the name, so you can infer from the name the full chain of processing. The options nameAppend and copyInputName (available to most data processors) control this behaviour, although some components might internally override them or change the behaviour a bit.

查看每个数据存储级别的名称(和其他信息),包括在配置中产生组件的功能,您可以在componentInstances:cComponentManager的部分中设置选项"printLevelStats = 5".

To see the names (and other info) for each data memory level, including e.g. which features a component in the configuration produces, you can set the option "printLevelStats=5" in the section of componentInstances:cComponentManager.

由于openSMILE中的所有内容都是为实时增量处理而构建的,因此每个数据内存级别都有一个缓冲区,默认情况下,它是一个环形缓冲区,以在应用程序运行较长时间时保持内存占用量不变. 有时您可能想在给定长度的窗口上汇总要素(例如,使用cFunctionals组件).在这种情况下,必须确保此组件的输入级别的缓冲区大小足以容纳整个窗口.您可以通过以下选项进行操作:

As everyhting in openSMILE is built for real-time incremental processing, each data memory level has a buffer, which by default is a ring buffer to keep memory footprint constant when the application runs for a longer time. Sometimes you might want to summarise features over a window of a given length (e.g. with the cFunctionals component). In this case you must ensure that the buffer size of the input level to this component is large enough to hold the full window. You do this via the following options:

  • writer.levelconf.isRb = 1/0:将缓冲区类型设置为环形缓冲区(1)或固定大小的缓冲区

  • writer.levelconf.isRb = 1/0 : sets type of buffer to ringbuffer (1) or fixed size buffer

writer.levelconf.growDyn = 1/0:将缓冲区设置为在写入更多数据时动态增长(1)

writer.levelconf.growDyn = 1/0 : sets the buffer to dynamically grow if more data is written to it (1)

writer.levelconf.nT =设置缓冲区的大小(以帧为单位).另外,您可以使用bufferSizeSec = x设置大小(以秒为单位)并自动转换为帧.

writer.levelconf.nT = sets the size of the buffer in frames. Alternatively you can use bufferSizeSec=x to set the size size in seconds and convert to frames automatically.

在大多数情况下,尺寸会自动正确设置.后续级别也继承了先前级别的配置.例外情况是,当您将cFunctionals组件设置为读取完整的输入时(例如,仅在文件末尾产生一个功能),则必须在从其读取功能的组件级别上使用growDyn = 1,或者如果您使用变量框架模式(请参见下文).

In most cases the sizes will be set correctly automatically. Subsequent levels also inherit the configuration from the previous levels. Exceptions are when you set a cFunctionals component to read the full input (e.g. only produce one feature at the end of the file), the you must use growDyn=1 on the level that the functionals component reads from, or if you use a variable framing mode (see below).

cFunctionals组件提供了 frameMode,frameSize, frameStep 选项.其中frameMode可以为 full *(在输入/文件末尾生成一个矢量),** list (指定帧列表), var (接收消息,例如来自一个cTurnDetector组件,它可以即时定义帧)或 fix (固定长度窗口).仅在 fix 情况下,frameSize选项才设置此窗口的大小,而 frameStep 则是窗口向前移动的速率.如果修复,则自动正确设置了输入级别的缓冲区大小,在其他情况下,则必须手动设置.

The cFunctionals component provides frameMode, frameSize, and frameStep options. Where frameMode can be full* (one vector produced at end of input/file), **list (specify a list of frames), var (receive messages, e.g. from a cTurnDetector component, that define frames on-the-fly), or fix (fixed length window). Only in the case of fix the options frameSize set the size of this window, and frameStep the rate at which the window is shifted forward. In case of fix the buffer size of the input level is set correctly automatically, in the other cases you have to set it manually.

我希望这可以帮助您入门!对于audEERING的每一个新的openSMILE版本,我们都试图更好地记录事物并通过各种组件来统一事物.

I hope this helps you to get started! With every new openSMILE release we at audEERING are trying to document things a bit better and unify things through various components.

我们也欢迎社区的贡献(例如,愿意编写图形化配置文件编辑器,在其中拖放组件并以图形方式连接它们的任何人?;))-尽管我们知道更多的文档资料将使此过程变得更容易.在此之前,您始终必须阅读源代码;)

We also welcome contributions from the community (e.g. anybody willing to write a graphical configuration file editor where you drag/drop components and connect them graphically? ;)) - although we know that more documentation will make this easier. Until then, you always have to source code to read ;)

干杯, 弗洛里安

这篇关于如何在OpenSMILE中创建自定义配置文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆