鸡尾酒会算法SVD实现...只需一行代码? [英] cocktail party algorithm SVD implementation ... in one line of code?

查看:474
本文介绍了鸡尾酒会算法SVD实现...只需一行代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在斯坦福大学的安德鲁·伍(Andrew Ng)在Coursera的机器学习入门讲座的幻灯片中,他给出了鸡尾酒会问题的以下一行八度音阶解决方案,因为音频源是由两个空间分开的麦克风记录的:

In a slide within the introductory lecture on machine learning by Stanford's Andrew Ng at Coursera, he gives the following one line Octave solution to the cocktail party problem given the audio sources are recorded by two spatially separated microphones:

[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

幻灯片的底部是来源:Sam Roweis,Yair Weiss和Eero Simoncelli",幻灯片的底部是由Te-Won Lee提供的音频剪辑".在视频中,吴教授说,

At the bottom of the slide is "source: Sam Roweis, Yair Weiss, Eero Simoncelli" and at the bottom of an earlier slide is "Audio clips courtesy of Te-Won Lee". In the video, Professor Ng says,

因此,您可能会看这样的无监督学习,并问,'实施此过程有多复杂?"为了构建此应用程序,似乎要进行音频处理,您将编写大量代码,或者链接到处理音频的一堆C ++或Java库中.处理音频的复杂程序:分离音频等等,结果证明该算法可以完成您刚刚听到的内容,只需一行代码即可完成……如此处所示,这确实耗费了研究人员很长时间提出这一行代码.因此,我并不是说这是一个简单的问题.但是事实证明,当您使用正确的编程环境时,许多学习算法实际上都是很短的程序."

"So you might look at unsupervised learning like this and ask, 'How complicated is it to implement this?' It seems like in order to build this application, it seems like to do this audio processing, you would write a ton of code, or maybe link into a bunch of C++ or Java libraries that process audio. It seems like it would be a really complicated program to do this audio: separating out audio and so on. It turns out the algorithm to do what you just heard, that can be done with just one line of code ... shown right here. It did take researchers a long time to come up with this line of code. So I'm not saying this is an easy problem. But it turns out that when you use the right programming environment many learning algorithms will be really short programs."

在视频讲座中单独播放的音频结果并不完美,但在我看来却是惊人的.有人对那一行代码的性能有何见解吗?特别是,没有人知道参考文献来解释Te-Won Lee,Sam Roweis,Yair Weiss和Eero Simoncelli在那一行代码方面的工作吗?

The separated audio results played in the video lecture are not perfect but, in my opinion, amazing. Does anyone have any insight on how that one line of code performs so well? In particular, does anyone know of a reference that explains the work of Te-Won Lee, Sam Roweis, Yair Weiss, and Eero Simoncelli with respect to that one line of code?

更新

为证明算法对麦克风分离距离的敏感性,下面的模拟(以八度为单位)将音调与两个空间分离的音调发生器分开.

To demonstrate the algorithm's sensitivity to microphone separation distance, the following simulation (in Octave) separates the tones from two spatially separated tone generators.

% define model 
f1 = 1100;              % frequency of tone generator 1; unit: Hz 
f2 = 2900;              % frequency of tone generator 2; unit: Hz 
Ts = 1/(40*max(f1,f2)); % sampling period; unit: s 
dMic = 1;               % distance between microphones centered about origin; unit: m 
dSrc = 10;              % distance between tone generators centered about origin; unit: m 
c = 340.29;             % speed of sound; unit: m / s 

% generate tones
figure(1);
t = [0:Ts:0.025];
tone1 = sin(2*pi*f1*t);
tone2 = sin(2*pi*f2*t);
plot(t,tone1); 
hold on;
plot(t,tone2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('tone 1', 'tone 2');
hold off;

% mix tones at microphones
% assume inverse square attenuation of sound intensity (i.e., inverse linear attenuation of sound amplitude)
figure(2);
dNear = (dSrc - dMic)/2;
dFar = (dSrc + dMic)/2;
mic1 = 1/dNear*sin(2*pi*f1*(t-dNear/c)) + \
       1/dFar*sin(2*pi*f2*(t-dFar/c));
mic2 = 1/dNear*sin(2*pi*f2*(t-dNear/c)) + \
       1/dFar*sin(2*pi*f1*(t-dFar/c));
plot(t,mic1);
hold on;
plot(t,mic2,'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -1 1]); legend('mic 1', 'mic 2');
hold off;

% use svd to isolate sound sources
figure(3);
x = [mic1' mic2'];
[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
plot(t,v(:,1));
hold on;
maxAmp = max(v(:,1));
plot(t,v(:,2),'r'); xlabel('time'); ylabel('amplitude'); axis([0 0.005 -maxAmp maxAmp]); legend('isolated tone 1', 'isolated tone 2');
hold off;

在便携式计算机上执行大约10分钟后,模拟生成以下三个图形,说明两个孤立的音调具有正确的频率.

After about 10 minutes of execution on my laptop computer, the simulation generates the following three figures illustrating the two isolated tones have the correct frequencies.

但是,将麦克风间隔距离设置为零(即dMic = 0)会使模拟生成以下三个图形,说明模拟无法隔离第二个音调(由svd s中返回的单个有效对角线项确认)矩阵).

However, setting the microphone separation distance to zero (i.e., dMic = 0) causes the simulation to instead generate the following three figures illustrating the simulation could not isolate a second tone (confirmed by the single significant diagonal term returned in svd's s matrix).

我希望智能手机上的麦克风间距足够大,以产生良好的效果,但是将麦克风间距设置为5.25英寸(即dMic = 0.1333米)会导致模拟产生以下结果,但不尽人意,这些图说明了第一个隔离音中的高频分量.

I was hoping the microphone separation distance on a smartphone would be large enough to produce good results but setting the microphone separation distance to 5.25 inches (i.e., dMic = 0.1333 meters) causes the simulation to generate the following, less than encouraging, figures illustrating higher frequency components in the first isolated tone.

推荐答案

两年后,我也试图弄清楚这一点.但是我得到了答案.希望它将对某人有所帮助.

I was trying to figure this out as well, 2 years later. But I got my answers; hopefully it'll help someone.

您需要2录音.您可以从 http://research.ics.aalto.fi/ica/cocktail/cocktail_en中获得音频示例. cgi .

You need 2 audio recordings. You can get audio examples from http://research.ics.aalto.fi/ica/cocktail/cocktail_en.cgi.

实施参考是 http://www.cs.nyu.edu/~roweis/kica.html

好,这是代码-

[x1, Fs1] = audioread('mix1.wav');
[x2, Fs2] = audioread('mix2.wav');
xx = [x1, x2]';
yy = sqrtm(inv(cov(xx')))*(xx-repmat(mean(xx,2),1,size(xx,2)));
[W,s,v] = svd((repmat(sum(yy.*yy,1),size(yy,1),1).*yy)*yy');

a = W*xx; %W is unmixing matrix
subplot(2,2,1); plot(x1); title('mixed audio - mic 1');
subplot(2,2,2); plot(x2); title('mixed audio - mic 2');
subplot(2,2,3); plot(a(1,:), 'g'); title('unmixed wave 1');
subplot(2,2,4); plot(a(2,:),'r'); title('unmixed wave 2');

audiowrite('unmixed1.wav', a(1,:), Fs1);
audiowrite('unmixed2.wav', a(2,:), Fs1);

这篇关于鸡尾酒会算法SVD实现...只需一行代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆