如何缩放基于FFT的互相关,使其峰值等于Pearson的rho [英] How do I scale an FFT-based cross-correlation such that its peak is equal to Pearson's rho

查看:92
本文介绍了如何缩放基于FFT的互相关,使其峰值等于Pearson的rho的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题描述

FFT可用于计算两个信号或图像之间的互相关.要确定两个信号 A B 之间的延迟或滞后,只需找到以下信号的峰值即可: IFFT(FFT(A)*conjugate(FFT(B)))

但是,峰值的幅度与各个信号的频谱的幅度有关.因此,要确定皮尔逊相关性(rho),则必须缩放此峰值的幅度通过两个信号中的总能量.

一种实现此目的的方法是通过各个自相关的几何平均值进行归一化.这给出了 rho 的合理近似值,尤其是在样本之间的延迟很小但不是精确值的情况下.

我认为产生此错误的原因是仅针对信号的重叠部分定义了Pearson相关,而归一化因子(两个自相关峰的几何平均值)包括来自非重叠部分的贡献.我考虑了两种方法来解决此问题,并通过FFT生成 rho 的准确值.在第一个示例中(以下称为rho_exact_1),我将样本修整到它们的重叠部分,并从中计算出归一化因子.在第二部分(以下称为rho_exact_2)中,我计算了信号重叠部分中包含的测量值的分数,并将全自相关归一化因子乘以该分数.

均无效!下图显示了使用基于DFT的互相关来计算Pearson的 rho 的三种方法的图解.仅显示互相关峰的区域.每个估算值都接近1.0的正确值,但不等于它.

我用来执行计算的代码如下.我使用一个简单的正弦波作为示例信号.我注意到,如果使用方波(占空比不一定是50%),方法的误差就会改变.

有人可以解释发生了什么事吗?

一个有效的例子

import numpy as np
from matplotlib import pyplot as plt

# make a time vector w/ 256 points
# and a source signal
N_cycles = 10.0
N_points = 256.0

t = np.arange(0,N_cycles*np.pi,np.pi*N_cycles/N_points)
signal = np.sin(t)

use_rect = False
if use_rect:
    threshold = -0.75
    signal[np.where(signal>=threshold)]=1.0
    signal[np.where(signal<threshold)]=-1.0

# normalize the signal (not technically
# necessary for this example, but required
# for measuring correlation of physically
# different signals)
signal = signal/signal.std()

# generate two samples of the signal
# with a temporal offset:
N = 128
offset = 5
sample_1 = signal[:N]
sample_2 = signal[offset:N+offset]

# determine the offset through cross-
# correlation
xc_num = np.abs(np.fft.ifft(np.fft.fft(sample_1)*np.fft.fft(sample_2).conjugate()))
offset_estimate = np.argmax(xc_num)
if offset_estimate>N//2:
    offset_estimate = offset_estimate - N


# for an approximate estimate of Pearson's
# correlation, we normalize by the RMS
# of individual autocorrelations:
autocorrelation_1 = np.abs(np.fft.ifft(np.fft.fft(sample_1)*np.fft.fft(sample_1).conjugate()))
autocorrelation_2 = np.abs(np.fft.ifft(np.fft.fft(sample_2)*np.fft.fft(sample_2).conjugate()))
xc_denom_approx = np.sqrt(np.max(autocorrelation_1))*np.sqrt(np.max(autocorrelation_2))
rho_approx = xc_num/xc_denom_approx
print 'rho_approx',np.max(rho_approx)

# this is an approximation because we've
# included autocorrelation of the whole samples
# instead of just the overlapping portion;
# using cropped versions of the samples should
# yield the correct correlation:
sample_1_cropped = sample_1[offset:]
sample_2_cropped = sample_2[:-offset]

# these should be identical vectors:
assert np.all(sample_1_cropped==sample_2_cropped)

# compute autocorrelations of cropped samples
# and corresponding value for rho
autocorrelation_1_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_1_cropped).conjugate()))
autocorrelation_2_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_2_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
xc_denom_exact_1 = np.sqrt(np.max(autocorrelation_1_cropped))*np.sqrt(np.max(autocorrelation_2_cropped))
rho_exact_1 = xc_num/xc_denom_exact_1
print 'rho_exact_1',np.max(rho_exact_1)

# alternatively we could try to use the
# whole sample autocorrelations and just
# scale by the number of pixels used to
# compute the numerator:
scaling_factor = float(len(sample_1_cropped))/float(len(sample_1))
rho_exact_2 = xc_num/(xc_denom_approx*scaling_factor)
print 'rho_exact_2',np.max(rho_exact_2)

# finally a sanity check: is rho actually 1.0
# for the two signals:
rho_corrcoef = np.corrcoef(sample_1_cropped,sample_2_cropped)[0,1]
print 'rho_corrcoef',rho_corrcoef

x = np.arange(len(rho_approx))
plt.plot(x,rho_approx,label='FFT rho_approx')
plt.plot(x,rho_exact_1,label='FFT rho_exact_1')
plt.plot(x,rho_exact_2,label='FFT rho_exact_2')
plt.plot(x,np.ones(len(x))*rho_corrcoef,'k--',label='Pearson rho')
plt.legend()
plt.ylim((.75,1.25))
plt.xlim((0,20))
plt.show()

解决方案

两个N周期离散信号F和G之间的归一化互相关定义为:

由于分子是两个向量(F和G_x)之间的点积,而分母是这两个向量的范数的乘积,因此标量r_x实际上必须在-1和+1之间,并且它是向量之间的角度(请参见).如果向量F和G_x对齐,则r_x = 1.如果r_x = 1,则由于三角不等式,向量F和G_x对齐.为了确保这些特性,分子的向量必须与分母的向量匹配.

可以使用离散傅立叶变换一次计算所有分子.实际上,这种变换将卷积变成傅立叶空间中的点积.这就是为什么在您执行的测试中,不同的估计归一化互相关不为1的原因.

  1. 对于第一个测试,"aprox",sample_1sample_2均从周期信号中提取.两者的长度相同,但长度不是周期的倍数,因为它是2.5个周期(5pi)(下图).结果,由于dft就像在周期性信号中一样执行相关,因此发现sample_1sample_2没有被完美地相关并且r_x <1.

  2. 对于第二个测试rho_exact_1,对长度为N = 128的信号执行卷积,但是对分母的范数是对大小为N-offset = 128-5的截断向量进行计算的.结果,r_x的属性丢失.另外,必须注意,所提出的卷积和范数未标准化:计算的范数和卷积乘积总体上与所考虑矢量的点数成比例.结果,与前一种情况相比,截断向量的范数略低,并且r_x增加:随着offset增加,可能会遇到大于1的值.

  3. 对于第三项测试rho_exact_2,引入了比例因子以尝试纠正第一项测试:r_x的属性也会丢失,并且当比例因子大于1时,可能会遇到大于一个的值. .

尽管如此,numpy的函数corrcoef()实际上为截断的信号计算了一个等于1的r_x.确实,这些信号完全相同!使用DFT可以获得相同的结果:

xc_num_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
autocorrelation_1_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_1_cropped).conjugate()))
autocorrelation_2_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_2_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
xc_denom_exact_11 = np.sqrt(np.max(autocorrelation_1_cropped))*np.sqrt(np.max(autocorrelation_2_cropped))
rho_exact_11 = xc_num_cropped/xc_denom_exact_11
print 'rho_exact_11',np.max(rho_exact_11)  

要为用户提供较大的r_x值,可以坚持使用第一个测试提供的值,如果帧的长度不是周期的倍数,则该值可以小于相同周期信号的值.为了纠正此缺陷,估计的偏移量也可以取回,并用于构建两个相同长度的裁剪信号.必须重新运行整个相关过程,以获取r_x的新值,而不会因裁剪的帧的长度不是该周期的倍数这一事实而烦恼.

最后,如果DFT是一次针对x的所有值在分子上计算卷积的非常有效的方法,则可以使用numpy.linalg.norm将分母有效地计算为向量的2范数.由于如果第一关联成功,则裁剪信号的argmax(r_x)可能为零,因此使用点积"sample_1_cropped.dot(sample_2_cropped)"来计算r_0就足够了.

Description of the problem

FFT can be used to compute cross-correlation between two signals or images. To determine the delay or lag between two signals A and B, it suffices to locate the peak of: IFFT(FFT(A)*conjugate(FFT(B)))

However, the amplitude of the peak is related to the amplitude of the frequency spectra of the individual signals. Thus to determine the Pearson correlation (rho), the amplitude of this peak must be scaled by the total energy in the two signals.

One way to do this is to normalize by the geometric mean of the individual autocorrelations. This gives a reasonable approximation of rho, especially when the delay between samples is small, but not the exact value.

I thought the reason for this error was that the Pearson correlation is only defined for the overlapping portions of the signal, whereas the normalization factor (the geometric mean of the two autocorrelation peaks) includes contributions from the non-overlapping portions. I considered two approaches for fixing this and producing an exact value for rho via FFT. In the first (called rho_exact_1 below), I trimmed the samples down to their overlapping portions and computed the normalization factor from those. In the second (called rho_exact_2 below), I computed the fraction of the measurements contained in the overlapping portion of the signals and multiplied the full-autocorrelation-normalization factor by that fraction.

Neither works! The figure below shows plots of the three approaches for calculating Pearson's rho using DFT-based cross-correlation. Only the region of the cross-correlation peak is shown. Each estimate is close to the correct value of 1.0, but not equal to it.

The code I used to perform the calculations is below. I used a simple sine wave as an example signal. I noticed that if I use a square-wave (w/ duty cycle not necessarily 50%) the approaches' errors change.

Can somebody explain what's going on?

A working example

import numpy as np
from matplotlib import pyplot as plt

# make a time vector w/ 256 points
# and a source signal
N_cycles = 10.0
N_points = 256.0

t = np.arange(0,N_cycles*np.pi,np.pi*N_cycles/N_points)
signal = np.sin(t)

use_rect = False
if use_rect:
    threshold = -0.75
    signal[np.where(signal>=threshold)]=1.0
    signal[np.where(signal<threshold)]=-1.0

# normalize the signal (not technically
# necessary for this example, but required
# for measuring correlation of physically
# different signals)
signal = signal/signal.std()

# generate two samples of the signal
# with a temporal offset:
N = 128
offset = 5
sample_1 = signal[:N]
sample_2 = signal[offset:N+offset]

# determine the offset through cross-
# correlation
xc_num = np.abs(np.fft.ifft(np.fft.fft(sample_1)*np.fft.fft(sample_2).conjugate()))
offset_estimate = np.argmax(xc_num)
if offset_estimate>N//2:
    offset_estimate = offset_estimate - N


# for an approximate estimate of Pearson's
# correlation, we normalize by the RMS
# of individual autocorrelations:
autocorrelation_1 = np.abs(np.fft.ifft(np.fft.fft(sample_1)*np.fft.fft(sample_1).conjugate()))
autocorrelation_2 = np.abs(np.fft.ifft(np.fft.fft(sample_2)*np.fft.fft(sample_2).conjugate()))
xc_denom_approx = np.sqrt(np.max(autocorrelation_1))*np.sqrt(np.max(autocorrelation_2))
rho_approx = xc_num/xc_denom_approx
print 'rho_approx',np.max(rho_approx)

# this is an approximation because we've
# included autocorrelation of the whole samples
# instead of just the overlapping portion;
# using cropped versions of the samples should
# yield the correct correlation:
sample_1_cropped = sample_1[offset:]
sample_2_cropped = sample_2[:-offset]

# these should be identical vectors:
assert np.all(sample_1_cropped==sample_2_cropped)

# compute autocorrelations of cropped samples
# and corresponding value for rho
autocorrelation_1_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_1_cropped).conjugate()))
autocorrelation_2_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_2_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
xc_denom_exact_1 = np.sqrt(np.max(autocorrelation_1_cropped))*np.sqrt(np.max(autocorrelation_2_cropped))
rho_exact_1 = xc_num/xc_denom_exact_1
print 'rho_exact_1',np.max(rho_exact_1)

# alternatively we could try to use the
# whole sample autocorrelations and just
# scale by the number of pixels used to
# compute the numerator:
scaling_factor = float(len(sample_1_cropped))/float(len(sample_1))
rho_exact_2 = xc_num/(xc_denom_approx*scaling_factor)
print 'rho_exact_2',np.max(rho_exact_2)

# finally a sanity check: is rho actually 1.0
# for the two signals:
rho_corrcoef = np.corrcoef(sample_1_cropped,sample_2_cropped)[0,1]
print 'rho_corrcoef',rho_corrcoef

x = np.arange(len(rho_approx))
plt.plot(x,rho_approx,label='FFT rho_approx')
plt.plot(x,rho_exact_1,label='FFT rho_exact_1')
plt.plot(x,rho_exact_2,label='FFT rho_exact_2')
plt.plot(x,np.ones(len(x))*rho_corrcoef,'k--',label='Pearson rho')
plt.legend()
plt.ylim((.75,1.25))
plt.xlim((0,20))
plt.show()

解决方案

The normalised cross correlation between two N-periodic discrete signals F and G is defined as:

Since the numerator is a dot product between two vectors (F and G_x) and the denominator is the product of the norm of these two vectors, the scalar r_x must indeed lie between -1 and +1 and it is the cosinus of the angle between the vectors (See there). If the vector F and G_x are aligned, then r_x=1. If r_x=1, then the vector F and G_x are aligned due to the triangular inequality. To ensure these properties, the vectors at the numerator must match those at the denominator.

All numerators can be computed at once by using the Discrete Fourier Transform. Indeed, that transform turns the convolution into pointwise products in the Fourier space. Here is why the different estimated normalized cross correlations are not 1 in the tests you performed.

  1. For the first test "approx", sample_1 and sample_2 are both extracted from a periodic signal. Both are of the same length, but the length is not a multiple of the period as it is 2.5 periods (5pi) (figure below). As a result, since the dft performs the correlation as if they where periodic signals, it is found that sample_1 and sample_2 are not perfectly correlated and r_x<1.

  2. For the second test rho_exact_1, the convolution is performed on signals of length N=128, but the norms at the denominator are computed on truncated vectors of size N-offset=128-5. As a result, the properties of r_x are lost. In addition, it must be noticed that the proposed convolution and norms are not normalized: the computed norms and convolution product are globally proportionnal to the number of points of the considered vectors. As a result, the norms of the truncated vectors are slightly lower compared to the previous case and r_x increases: values larger that 1 are likely encountered as the offset increases.

  3. For the third test rho_exact_2, a scaling factor is introduced to try to correct the first test: the properties of r_x are also lost and values larger than one can be encountered as the scaling factor is larger than one.

Nevertheless, the function corrcoef() of numpy actually computes a r_x equal to 1 for the truncated signals. Indeed, these signals are perfectly identical! The same result can be obtained using DFTs:

xc_num_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
autocorrelation_1_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_1_cropped)*np.fft.fft(sample_1_cropped).conjugate()))
autocorrelation_2_cropped = np.abs(np.fft.ifft(np.fft.fft(sample_2_cropped)*np.fft.fft(sample_2_cropped).conjugate()))
xc_denom_exact_11 = np.sqrt(np.max(autocorrelation_1_cropped))*np.sqrt(np.max(autocorrelation_2_cropped))
rho_exact_11 = xc_num_cropped/xc_denom_exact_11
print 'rho_exact_11',np.max(rho_exact_11)  

To provide the user with a significant value for r_x, you can stick to the value provided by the first test, which can be lower than one for identical periodic signals if the length of the frame is not a multiple of the period. To correct this drawback, the estimated offset can also be retreived and used to build two cropped signals of the same length. The whole correlation procedure must be re-run to get a new value for r_x, which will not be plaged by the fact that the length of the cropped frame is not a multiple of the period.

Lastly, if the DFT is a very efficient way to compute the convolution at the numerator for all values of x at once, the denominator can be efficiently computed as 2-norms of vector, using numpy.linalg.norm. Since the argmax(r_x) for the cropped signals will likely be zero if the first correlation was successful, it could be sufficient to compute r_0 using a dot product `sample_1_cropped.dot(sample_2_cropped).

这篇关于如何缩放基于FFT的互相关,使其峰值等于Pearson的rho的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆