如何向量化Fisher的精确检验? [英] How to vectorize Fisher's exact test?

查看:233
本文介绍了如何向量化Fisher的精确检验?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用费舍尔精确检验的向量化来优化此计算? num_cases>〜1000000时,运行时很麻烦.

Is it possible, and if so how, to optimize this calculation using the vectorization of Fisher's exact test? Runtime is cumbersome when num_cases > ~1000000.

import numpy as np
from scipy.stats import fisher_exact

num_cases = 100
randCounts = np.random.random_integers(100,size=(num_cases,4))

def testFisher(randCounts):
    return [fisher_exact([[r[0],r[1]],[r[2], r[3]]])[0] for r in randCounts]

In [6]: %timeit testFisher(randCounts)
        1 loops, best of 3: 524 ms per loop

推荐答案

以下是使用fisher的答案,具体方法已在渔民.我用numpy手动计算OR.

Here is an answer using fisher exact as implemented in fisher. I compute the OR by hand in numpy.

安装:

# pip install fisher
# or 
# conda install -c bioconda fisher

设置:

import numpy as np
np.random.seed(0)
num_cases = 100
c = np.random.randint(100,size=(num_cases,4), dtype=np.uint)

# head, i.e. 
c[:5]
# array([[44, 47, 64, 67],
#   [67,  9, 83, 21],
#   [36, 87, 70, 88],
#   [88, 12, 58, 65],
#   [39, 87, 46, 88]], dtype=uint64)

执行:

from fisher import pvalue_npy
_, _, twosided = pvalue_npy(c[:, 0], c[:, 1], c[:, 2], c[:, 3])
odds = (c[:, 0] * c[:, 3]) / (c[:, 1] * c[:, 2])

print("result fast p and odds", odds[0], twosided[0])
# result fast p and odds 0.9800531914893617 1.0
print("result slow", fisher_exact([[c[0][0], c[0][1]], [c[0][2], c[0][3]]]))
# result slow (0.9800531914893617, 1.0)

请注意,对于一百万行,只需要两秒钟即可:)

Note that for one million rows it only takes two seconds :)

此外,要计算近似OR,您可能需要在找到比值比之前向表中添加一个伪计数.这通常比inf更有趣,因为您可以比较近似值:):

Also, to compute an approximate OR you might want to add a pseudocount to the table before finding the oddsratio. This is often more interesting than inf, since you can compare the approximations :) :

c2 = c + 1
odds = (c2[:, 0] * c2[:, 3]) / (c2[:, 1] * c2[:, 2])

from 0.0.61> =此方法作为pr.stats.fisher_exact包含在 pyranges 中.

from 0.0.61>= this method is included in pyranges as pr.stats.fisher_exact.

这篇关于如何向量化Fisher的精确检验?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆