说话人识别 [英] Speaker Recognition

查看:226
本文介绍了说话人识别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎么可能两个人说话区分?作为如果有人说你好,然后另一人说你好是什么样的签名,我应该寻找音频数据?周期性?

How could I differentiate between two people speaking? As in if someone says "hello" and then another person says "hello" what kind of signature should I be looking for in the audio data? periodicity?

非常感谢任何人谁可以回答这个问题!

Thanks a lot to anyone who can answer this!

推荐答案

这个问题的解决之道在于数字信号处理(DSP)。说话人识别是一个复杂的问题,使计算机与通信工程,以携手合作。的说话人识别大多数技术需要使用机器学习(使用训练数据在扬声器数据库培训,然后识别)信号处理。算法的轮廓可接着 -

The solution to this problem lies in Digital Signal Processing (DSP). Speaker recognition is a complex problem which brings computers and communication engineering to work hand in hand. Most techniques of speaker identification require signal processing with machine learning (training over the speaker database and then identification using training data). The outline of algorithm which may be followed -


  1. 记录 RAW格式音频。这将作为在数字信号,需要进行处理。

  2. 涂一些的 pre-处理程序在捕获的信号。这些程序可以是简单的信号正常化过滤的信号(使用人声的正常频率范围的带通滤波器来消除噪声。带通滤波器可以在转使用低通和组合的一个高通滤波器来创建。)

  3. 一旦是相当肯定,捕获的信号是pretty太大的噪音干扰,特征提取阶段开始。一些这是用于提取语音功能的公知技术是 - Mel频率倒谱系数( MFCC ),线性predictive编码( LPC )或简单的<强> FFT 功能。

  4. 现在,有两个阶段 - 培训和测试

  5. 首先,系统需要的培训了声音才能够对其进行区分功能不同的扬声器。为了确保该功能计算正确,则建议多个(> 10)从扬声器语音样本必须收集为训练目的。

  6. 培训可以使用不同的技术例如神经网络或基于距离分类找到不同的声音音箱的特点的差异进行。

  7. 在测试阶段,训练数据被用来发现其位于在从该信号的最低距离被测试的语音功能集。不同的距离例如欧几里德或切比雪夫可能被用于距离来计算此接近。

  1. Record the audio in raw format. This serves as the digital signal which needs to be processed.
  2. Apply some pre-processing routines over the captured signal. These routines could be simply signal normalization, or filtering the signal to remove noise (using band pass filters for normal frequency range of human voice. Band pass filters can in turn be created using a low pass and a high pass filter in combination.)
  3. Once it is fairly certain that the captured signal is pretty much free from noise, feature extraction phase begins. Some of the known techniques which are used for extracting voice features are - Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding (LPC) or simple FFT features.
  4. Now, there are two phases - training and testing.
  5. First the system needs to be trained over the voice features of different speakers before it is capable to distinguish between them. In order to ensure that the features are correctly calculated, it is recommended that several (>10) samples of voice from speakers must be collected for training purposes.
  6. Training can be done using different techniques like neural networks or distance based classification to find the differences in the features of voices from different speakers.
  7. In testing phase, the training data is used to find the voice feature set which lies at the lowest distance from the signal being tested. Different distances like Euclidean or Chebyshev distances might be used to calculate this proximity.

有两个开源的实现,这些实现说话人识别 - ALIZE 的http://米斯特拉尔.univ-avignon.fr / index_en.html MARF http://marf.sourceforge.net /

There are two open source implementations which enable speaker identification - ALIZE: http://mistral.univ-avignon.fr/index_en.html and MARF: http://marf.sourceforge.net/.

我知道它有点晚来回答这个问题,但我希望有人认为它有用。

I know its a bit late to answer this question, but I hope someone finds it useful.

这篇关于说话人识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆