高效的列相关系数计算 [英] Efficient columnwise correlation coefficient calculation

查看:130
本文介绍了高效的列相关系数计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

原始问题

我将大小为n的行P与大小为n×m的矩阵O的每一列相关.我编写了以下代码:

 import numpy as np

def ColumnWiseCorrcoef(O, P):
    n = P.size
    DO = O - (np.sum(O, 0) / np.double(n))
    DP = P - (np.sum(P) / np.double(n))
    return np.dot(DP, DO) / np.sqrt(np.sum(DO ** 2, 0) * np.sum(DP ** 2))
 

比天真的方法更有效:

 def ColumnWiseCorrcoefNaive(O, P):
    return np.corrcoef(P,O.T)[0,1:O[0].size+1]
 

以下是我在Intel内核上使用numpy-1.7.1-MKL的时间:

 O = np.reshape(np.random.rand(100000), (1000,100))
P = np.random.rand(1000)

%timeit -n 1000 A = ColumnWiseCorrcoef(O, P)
1000 loops, best of 3: 787 us per loop
%timeit -n 1000 B = ColumnWiseCorrcoefNaive(O, P)
1000 loops, best of 3: 2.96 ms per loop
 

现在是一个问题:您能为这个问题建议一个更快版本的代码吗?挤出额外的20%会很棒.

2017年5月更新

一段时间后,我又回到了这个问题,重新运行并扩展了任务和测试.

  1. 使用einsum,我将代码扩展到P不是行而是矩阵的情况.因此,任务是将O的所有列与P的所有列相关.

  2. 出于对如何用科学计算中常用的不同语言解决同一问题的好奇,我在他人,他人的帮助下,在MATLAB,Julia和R中实现了它.MATLAB和Julia是最快的,并且他们有一个专用的例程来计算列相关. R也有专用的例程,但是最慢.

  3. 在当前版本的numpy(来自Anaconda的1.12.1)中,einsum仍胜过我使用的专用功能.

所有脚本和时间都可以在 https://github.com/ikizhvatov/ficient上找到-columnwise-correlation .

解决方案

我们可以为此引入np.einsum;但是,您的里程可能会变化,具体取决于您的编译以及是否使用SSE2.额外的einsum调用看起来似乎是多余的,但是numpy的ufuncs直到numpy 1.8才使用SSE2,而einsum却使用了SSE2,我们可以避免使用一些if语句.

在具有intel mkl blas的opteron内核上运行此命令,我得到一个奇怪的结果,因为我希望dot调用会花费大部分时间.

 def newColumnWiseCorrcoef(O, P):
    n = P.size
    DO = O - (np.einsum('ij->j',O) / np.double(n))
    P -= (np.einsum('i->',P) / np.double(n))
    tmp = np.einsum('ij,ij->j',DO,DO)
    tmp *= np.einsum('i,i->',P,P)          #Dot or vdot doesnt really change much.
    return np.dot(P, DO) / np.sqrt(tmp)

O = np.reshape(np.random.rand(100000), (1000,100))
P = np.random.rand(1000)

old = ColumnWiseCorrcoef(O,P)
new = newColumnWiseCorrcoef(O,P)

np.allclose(old,new)
True

%timeit ColumnWiseCorrcoef(O,P)
100 loops, best of 3: 1.52ms per loop

%timeit newColumnWiseCorrcoef(O,P)
1000 loops, best of 3: 518us per loop
 

再次使用具有intel mkl的intel系统运行此命令,我会得到一些更合理/更期望的结果:

 %timeit ColumnWiseCorrcoef(O,P)
1000 loops, best of 3: 524 µs per loop

%timeit newColumnWiseCorrcoef(O,P)
1000 loops, best of 3: 354 µs per loop
 

再次在Intel机器上使用:

 O = np.random.rand(1E5,1E3)
P = np.random.rand(1E5)

%timeit ColumnWiseCorrcoef(O,P)
1 loops, best of 3: 1.33 s per loop

%timeit newColumnWiseCorrcoef(O,P)
1 loops, best of 3: 791 ms per loop
 

Original question

I am correlating row P of size n to every column of matrix O of size n×m. I crafted the following code:

import numpy as np

def ColumnWiseCorrcoef(O, P):
    n = P.size
    DO = O - (np.sum(O, 0) / np.double(n))
    DP = P - (np.sum(P) / np.double(n))
    return np.dot(DP, DO) / np.sqrt(np.sum(DO ** 2, 0) * np.sum(DP ** 2))

It is more efficient than the naive approach:

def ColumnWiseCorrcoefNaive(O, P):
    return np.corrcoef(P,O.T)[0,1:O[0].size+1]

Here are the timings I get with numpy-1.7.1-MKL on an Intel core:

O = np.reshape(np.random.rand(100000), (1000,100))
P = np.random.rand(1000)

%timeit -n 1000 A = ColumnWiseCorrcoef(O, P)
1000 loops, best of 3: 787 us per loop
%timeit -n 1000 B = ColumnWiseCorrcoefNaive(O, P)
1000 loops, best of 3: 2.96 ms per loop

Now the question: can you suggest an even faster version of the code for this problem? Squeezing out additional 20% would be great.

UPDATE May 2017

After quite some time I got back to this problem, re-run and extended the task and the tests.

  1. Using einsum, I have extended the code to the case where P is not a row but a matrix. So the task is correlating all columns of O to all columns of P.

  2. Being curious on how the same problem can be solved in different languages commonly used for scientific computing, I implemented it (with help from other people) in MATLAB, Julia, and R. MATLAB and Julia are the fastest, and they have a dedicated routine to compute columnwise correlation. R also has a dedicated routine but is the slowest.

  3. In the current version of numpy (1.12.1 from Anaconda) einsum still wins over dedicated functions I used.

All the scripts and timings are available at https://github.com/ikizhvatov/efficient-columnwise-correlation.

解决方案

We can introduce np.einsum to this; however, your milage may vary depending on your compilation and if it makes use of SSE2 or not. The extra einsum calls to replace summation operations might seem extraneous, but numpy ufuncs do not make use of SSE2 until numpy 1.8 while einsum does and we can avoid a few if statements.

Running this on a opteron core with intel mkl blas I get a odd result as I would expect the dot call to take the majority of the time.

def newColumnWiseCorrcoef(O, P):
    n = P.size
    DO = O - (np.einsum('ij->j',O) / np.double(n))
    P -= (np.einsum('i->',P) / np.double(n))
    tmp = np.einsum('ij,ij->j',DO,DO)
    tmp *= np.einsum('i,i->',P,P)          #Dot or vdot doesnt really change much.
    return np.dot(P, DO) / np.sqrt(tmp)

O = np.reshape(np.random.rand(100000), (1000,100))
P = np.random.rand(1000)

old = ColumnWiseCorrcoef(O,P)
new = newColumnWiseCorrcoef(O,P)

np.allclose(old,new)
True

%timeit ColumnWiseCorrcoef(O,P)
100 loops, best of 3: 1.52ms per loop

%timeit newColumnWiseCorrcoef(O,P)
1000 loops, best of 3: 518us per loop

Running this with a intel system with intel mkl again I get something more reasonable/expected:

%timeit ColumnWiseCorrcoef(O,P)
1000 loops, best of 3: 524 µs per loop

%timeit newColumnWiseCorrcoef(O,P)
1000 loops, best of 3: 354 µs per loop

Again on the intel machine with something a bit bigger:

O = np.random.rand(1E5,1E3)
P = np.random.rand(1E5)

%timeit ColumnWiseCorrcoef(O,P)
1 loops, best of 3: 1.33 s per loop

%timeit newColumnWiseCorrcoef(O,P)
1 loops, best of 3: 791 ms per loop

这篇关于高效的列相关系数计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆