使用 numpy(或其他矢量化方法)优化此功能 [英] Optimize this function with numpy (or other vectorization methods)

查看:61
本文介绍了使用 numpy(或其他矢量化方法)优化此功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用 Python 计算群体遗传学领域的经典计算.我很清楚有很多算法可以完成这项工作,但出于某种原因我想构建自己的算法.

I am computing with Python a classic calculation in the field of population genetics. I am well aware that there exists many algorithm that do the job but I wanted to build my own for some reason.

下一段是图片,因为 StackOverflow 不支持 MathJax

我想要一个有效的算法来计算那些 Fst.目前,我只设法进行 for 循环,并且没有对计算进行矢量化如何使用 numpy(或其他矢量化方法)进行此计算?

I would like to have an efficient algorithm to calculate those Fst. For the moment I only manage to make for loops and no calculations are vectorized How can I make this calculation using numpy (or other vectorization methods)?

这是我认为应该完成这项工作的代码:

Here is a code that I think should do the job:

def Fst(W, p):
    I = len(p[0])
    K = len(p)
    H_T = 0
    H_S = 0
    for i in xrange(I):
        bar_p_i = 0
        for k in xrange(K):
            bar_p_i += W[k] * p[k][i]
            H_S += W[k] * p[k][i] * p[k][i]
        H_T += bar_p_i*bar_p_i
    H_T = 1 - H_T
    H_S = 1 - H_S
    return (H_T - H_S) / H_T

def main():
    W = [0.2, 0.1, 0.2, 0.5]
    p = [[0.1,0.3,0.6],[0,0,1],[0.4,0.5,0.1],[0,0.1,0.9]]
    F = Fst(W,p)
    print("Fst = " + str(F))
    return

main()

推荐答案

这里没有理由使用循环.你真的不应该使用 Numba 或 Cython 来处理这些东西——像你这样的线性代数表达式是 Numpy 中向量化操作背后的全部原因.

There's no reason to use loops here. And you really shouldn't use Numba or Cython for this stuff - linear algebra expressions like the one you have are the whole reason behind vectorized operations in Numpy.

由于如果您继续使用 Numpy,此类问题会一次又一次地出现,我建议您在 Numpy 中获得有关线性代数的基本知识.您可能会发现这本书的章节很有帮助:

Since this type of problem is going to pop up again and again if you keep using Numpy, I would recommend getting a basic handle on linear algebra in Numpy. You might find this book chapter helpful:

https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html

至于您的具体情况:首先从您的变量创建 numpy 数组:

As for your specific situation: start by creating numpy arrays from your variables:

import numpy as np
W = np.array(W)
p = np.array(p)

现在,您的 \bar p_i^2 由点积定义.这很简单:

Now, your \bar p_i^2 are defined by a dot product. That's easy:

bar_p_i = p.T.dot(W)

注意转置的 T,因为点积取第一个矩阵的最后一个索引和第二个矩阵的第一个索引索引的元素的总和.转置会反转索引,因此第一个索引成为最后一个.

Note the T, for the transpose, because the dot product takes the sum of the elements indexed by the last index of the first matrix and the first index of the second matrix. The transpose inverts the indices so the first index becomes the last.

你 H_t 是由总和定义的.这也很简单:

You H_t is defined by a sum. That's also easy:

H_T = 1 - bar_p_i.sum()

同样适用于您的 H_S:

Similarly for your H_S:

H_S = 1 - ((bar_p_i**2).T.dot(W)).sum()

这篇关于使用 numpy(或其他矢量化方法)优化此功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆