函数式编程:用于创建预期值数组的Numpy矢量化函数 [英] Functional Programming: Numpy Vectorizable Function to Create an Expected Values Array

查看:112
本文介绍了函数式编程:用于创建预期值数组的Numpy矢量化函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对某些分类数据计数进行卡方统计检验 - 但为此,我需要计算与观察结果数组匹配的每个单元格的期望值。

数组中每个元素e的内容的伪代码是

  e = column_sum * row_sum / total_sum 

我写了一个函数,数组到其期望值的对应:

  def gen_expected(a_array):
new_array = np.zeros(a_array。形状,dtype = float)
为ri,枚举中的r(a_array):
为ci,c为枚举(a_array.T):
new_array [ri,ci] =(c。 sum()* r.sum()/ a_array.sum())
return new_array

这很好,但我真的想采用更多功能的方法,并在元素级别定义一个可以进行矢量化的函数(使用 np.vectorize )和在代码中执行而不执行任何可能的昂贵循环。



我的问题是元素级所需的信息不足以生成所需的输出 - 我试图找出如何从在(大概)元素级函数中 - 这是不可能的,还是有一种函数模式,我还没有意识到这适用于这种类型的聚合依赖条件?

解决方案

您可以使用播放。广播允许您将两个不同形状的数组加在一起,而不会造成过多的副本或循环过度。

我们可以通过分别创建两个分别代表行和列总和的向量并将它们相乘在一起来解决您的问题,这会将它们广播到正确大小和形状的数组中。



我知道的这个话题的最佳介绍是讲话失去你的循环:用Numpy进行快速数值计算by Jake Vanderplass。它包含了视觉示例,我发现它们是围绕广播包装头部的关键。

下面是一个简单的例子:

IN

  import numpy (np.arange(3),[3,1])$ ​​b $ b print('a =',a)
print('b =')
print(b)
print('a + b =')
print(a + b)


$ b

OUT:

  a = [0 1 2] 
b =
[[0]
[1]
[2]]
a + b =
[[0 1 2]
[1 2 3]
[2 3 4]]

我们可以通过创建两个表示行和列总和的向量来解决您的问题分别'相乘'在一起,将它们广播到正确大小和形状的数组中。

  import numpy as np 
def gen_expected(array:np.ndarray):
col_sums =(np.sum(array,axis = 0))
row_sums = np.sum(array,axis = 1)
np.reshape (row_sums,[len(row_sums),1])$ ​​b $ b return(col_sums * row_sums)/ np.sum(array)
#注意:这个结果可能会被调换!自己检查一下!


I want to run a chi-squared statistical test against some categorical data counts - but to do that, I need to calculate the expected values for each cell of an array matching my observed results array.

The pseudo code for the content of a each element e in the array is

e = column_sum * row_sum / total_sum

I've written a function that will convert an array into its expected-values counterpart:

def gen_expected(a_array):
    new_array = np.zeros(a_array.shape,dtype=float)
    for ri,r in enumerate(a_array):
        for ci,c in enumerate(a_array.T):
            new_array[ri,ci]=(c.sum()*r.sum()/a_array.sum())
    return new_array

This works well enough, but I'd really like to adopt a more functional approach and define a function at the element level that I can vectorise (using np.vectorize) and apply without performing any potentially expensive loops in code.

My problem is that the information required at element level isn't enough to generate the required output - I'm trying to figure out how to access the aggregate sum values from within the (presumably) element-level function - is this simply not possible, or is there a functional pattern I'm not yet aware of that fits this type of aggregate-reliant condition?

解决方案

You can do this with numpy built-ins using broadcasting. Broadcasting allows you to add together two arrays of different shapes without making excessive copies or looping excessively.

We can solve your problem by creating two vectors representing the row and column sums respectively, and 'multiplying' them together, which will broadcast them into a correctly sized and shaped array.

The best introduction to this topic I know of is the talk Losing Your Loops: Fast Numerical Computation with Numpy by Jake Vanderplass. It contains visual examples that I find essential for wrapping your head around broadcasting.

Here's a simple example:

IN

import numpy as np
a = np.arange(3)
b = np.reshape(np.arange(3), [3, 1])
print('a = ', a)
print('b = ')
print(b)
print('a+b = ')
print(a+b)

OUT:

a = [0 1 2]
b =
[[0]
 [1]
 [2]]
a+b =
[[0 1 2]
 [1 2 3]
 [2 3 4]]

We can solve your problem by creating two vectors representing the row and column sums respectively 'multiplying' them together, broadcasting them into a correctly sized and shaped array.

import numpy as np
def gen_expected(array: np.ndarray):
    col_sums = (np.sum(array, axis=0))
    row_sums = np.sum(array, axis=1)
    np.reshape(row_sums, [len(row_sums), 1])
    return (col_sums * row_sums)  / np.sum(array)
# NOTE: this result might be transposed! Check it yourself!

这篇关于函数式编程:用于创建预期值数组的Numpy矢量化函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆