在 numba 中使用 numpy.vstack [英] Using numpy.vstack in numba

查看:89
本文介绍了在 numba 中使用 numpy.vstack的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我一直在尝试优化一些从一些数组数据计算统计误差度量的代码.该指标称为连续排名概率得分 (CRPS).

我一直在使用 Numba 来尝试加快此计算中所需的双 for 循环,但是我遇到了 numpy.vstack 函数的问题.根据我从文档这里中的理解,应该支持 vstack() 函数,但是当我运行以下代码时出现错误.

def crps_hersbach_numba(obs, fcst_ens, remove_neg=False, remove_zero=False):"""按照公式 25-27 计算连续排名概率分数 (CRPS)赫斯巴赫等人.(2000)参数----------观察:1D ndarry每个开始日期的观察数组fcst_ens:二维数组维度 n x M 的集合预测数组,其中 n = 开始日期的数量和M = 合奏成员的数量.remove_neg: 布尔值如果为 True,当在观察到的 OR 集合中的第 i 个位置发现负值时数组,观察到的 AND 集合数组的第 i 个值在计算.remove_zero: 布尔值如果为真,当在观察到的 OR 集合中的第 i 个位置找到零值时数组,观察到的 AND 集合数组的第 i 个值在计算.退货-------字典字典包含许多*实验*输出,包括:- ["crps"] 每 n 个开始日期的 crps 值的一维 ndarray.- ["crpsMean1"] crps 值的算术平均值.- ["crpsMean2"] 使用 eqn 表示 crps.28 赫斯巴赫 (2000).笔记-----**NaN 和 inf 处理:** 如果 obs 或 fcst_ens 中的任何值是 NaN 或 inf,则fcst_ens(对于所有合奏成员)和 obs 中的相应行将被删除.参考----------- Hersbach, H. (2000) 连续排名可移植性分数的分解对于集合预测系统,天气和预测,15, 559-570."""# 处理数据obs,fcst_ens =treat_data(obs,fcst_ens,remove_neg=remove_neg,remove_zero=remove_zero)# 设置参数n = fcst_ens.shape[0] # 预测开始日期的数量m = fcst_ens.shape[1] # ensemble 成员的数量# 创建 pi 的向量p = np.linspace(0, m, m + 1)pi = p/米crps_numba = np.zeros(n)@njitdef calculate_crps():# 循环 fcst 开始时间对于我在 prange(n):# 初始化用于存储输出的向量a = np.zeros(m - 1)b = np.zeros(m - 1)# 验证分析(或 obs)xa = obs[i]# 集成 fcst CDFx = np.sort(fcst_ens[i, :])# 处理 0 <我<m [因此,对于 m = 51,将循环 50 次]对于 prange(m - 1) 中的 j:# 规则1如果 xa >x[j + 1]:a[j] = x[j + 1] - x[j]b[j] = 0# 规则 2如果 x[j] x[m - 1]:上午 = xa - x[m - 1]BM = 0别的:上午 = 0BM = 0# 结合完整的 &b 向量,包括异常值a = np.concatenate((np.array([0]), a, np.array([am])))# a = np.insert(a, 0, a1)# a = np.append(a, am)a = np.concatenate((np.array([0]), a, np.array([bm])))# b = np.insert(b, 0, b1)# b = np.append(b, bm)# 填充 a_mat 和 b_mat如果我 == 0:a_mat = ab_mat = b别的:a_mat = np.vstack((a_mat, a))b_mat = np.vstack((b_mat, b))# 计算单个开始时间的 crpscrps_numba[i] = ((a * pi ** 2) + (b * (1 - pi) ** 2)).sum()返回 crps_numba、a_mat、b_matcrps, a_mat, b_mat = calculate_crps()打印(crps)# 计算均值 crps 作为跨 crps[i] 的简单均值crps_mean_method1 = np.mean(crps)# 计算从 eqn 开始的所有开始时间的平均 crps.28 赫斯巴赫 (2000)abar = np.mean(a_mat, 0)bbar = np.mean(b_mat, 0)crps_mean_method2 = ((abar * pi ** 2) + (bbar * (1 - pi) ** 2)).sum()# 将数组输出为字典输出 = {'crps':crps,'crpsMean1':crps_mean_method1,'crpsMean2':crps_mean_method2}返回输出

我得到的错误是这样的:

无法统一array(float64, 1d, C) 和array(float64, 2d, C) for 'a_mat',定义在*path文件test.py",第 86 行:def calculate_crps():<来源省略>如果我 == 0:a_mat = a^[1] 期间:在 *path 处输入赋值文件test.py",第 89 行:def calculate_crps():<来源省略>别的:a_mat = np.vstack((a_mat, a))^这通常不是 Numba 本身的问题,而是经常由使用不受支持的功能或解决类型的问题.

我只是想知道我哪里出错了.似乎 vstack 函数应该可以工作,但也许我遗漏了一些东西.

解决方案

我只是想知道我哪里出错了.似乎 vstack 函数应该可以工作,但也许我遗漏了一些东西.

TL;DR:问题不在于 vstack.问题是您的代码路径试图将不同类型的数组分配给同一个变量(这会引发统一异常).

问题出在这里:

# 填充 a_mat 和 b_mat如果我 == 0:a_mat = ab_mat = b别的:a_mat = np.vstack((a_mat, a))b_mat = np.vstack((b_mat, b))

在第一个代码路径中,您将一个 1d c-contigous float64 数组分配给 a_matb_mat,而在 else 中,它是一个 2dc 连续 float64 数组.这些类型不兼容,因此 numba 会引发错误.numba 代码不像 Python 代码那样工作,有时会很棘手,因为在为变量赋值时,无论您拥有什么类型,都无关紧要.然而,在最近的版本中,numba 异常消息变得更好了,所以如果您知道异常提示什么,您通常可以快速找出问题所在.

更长的解释

问题是 numba 隐式推断变量的类型.例如:

from numba import njit@njit定义函数(arr):a = arr返回一个

这里我没有输入函数,所以我需要运行一次:

<预><代码>>>>将 numpy 导入为 np>>>功能(np.zeros(5))数组([0., 0., 0., 0., 0.])

然后您可以检查类型:

<预><代码>>>>func.inspect_types()

func (array(float64, 1d, C),)--------------------------------------------------------------------------------# 文件:<ipython-input-4-02470248b065># --- 第 3 行 ---# 标签 0@njit# --- 第 4 行 ---定义函数(arr):# --- 第 5 行 ---# arr = arg(0, name=arr) :: array(float64, 1d, C)# a = arr :: array(float64, 1d, C)# 删除a = arr# --- 第 6 行 ---# $0.3 = cast(value=a) :: array(float64, 1d, C)# 删除# 返回 $0.3返回一个

如您所见,变量 a 是为 array(float64, 1d, C) 类型的输入键入的 array(float64, 1d, C).

现在,让我们使用 np.vstack 代替:

from numba import njit将 numpy 导入为 np@njit定义函数(arr):a = np.vstack((arr, arr))返回一个

以及编译它的强制性第一次调用:

<预><代码>>>>功能(np.zeros(5))数组([[0., 0., 0., 0., 0.],[0., 0., 0., 0., 0.]])

然后再次检查类型:

func (array(float64, 1d, C),)--------------------------------------------------------------------------------# 文件:<ipython-input-11-f0214d5181c6># --- 第 4 行 ---# 标签 0@njit# --- 第 5 行 ---定义函数(arr):# --- 第 6 行 ---# arr = arg(0, name=arr) :: array(float64, 1d, C)# $0.1 = global(np: <module 'numpy'>) :: Module(<module 'numpy'>)# $0.2 = getattr(value=$0.1, attr=vstack) :: Function()# 德尔 $0.1# $0.5 = build_tuple(items=[Var(arr, <ipython-input-11-f0214d5181c6> (6)), Var(arr, <ipython-input-11-f0214d5181c6> (6))]) :: tuple(数组(float64, 1d, C) x 2)#德尔阿# $0.6 = 调用 $0.2($0.5, func=$0.2, args=[Var($0.5, <ipython-input-11-f0214d5181c6> (6))], kws=(), vararg=None) :: (tuple(array)(float64, 1d, C) x 2),) ->数组(float64,2d,C)# 德尔 $0.5# 德尔 $0.2# a = $0.6 :: array(float64, 2d, C)# 德尔 $0.6a = np.vstack((arr, arr))# --- 第 7 行 ---# $0.8 = cast(value=a) :: array(float64, 2d, C)# 删除# 返回 $0.8返回一个

这次a被输入为array(float64, 2d, C),用于输入array(float64, 1d, C).

你可能问过自己为什么我要谈论这个.让我们看看如果您尝试有条件地分配给 a 会发生什么:

from numba import njit将 numpy 导入为 np@njitdef func(arr, 条件):如果条件:a = np.vstack((arr, arr))别的:a = arr返回一个

如果您现在运行代码:

<预><代码>>>>功能(np.zeros(5),真)

TypingError: Failed at nopython (nopython frontend)无法统一数组(float64, 2d, C) 和数组(float64, 1d, C) 的'a',定义在<ipython-input-16-f4bd9a4f377a>(7)文件<ipython-input-16-f4bd9a4f377a>",第7行:def func(arr, 条件):<来源省略>如果条件:a = np.vstack((arr, arr))^[1] 期间:在<ipython-input-16-f4bd9a4f377a>处输入赋值;(9)文件<ipython-input-16-f4bd9a4f377a>",第 9 行:def func(arr, 条件):<来源省略>别的:a = arr^

这正是您遇到的问题,这是因为对于一组固定的输入类型,变量需要在 numba 中只有一种类型.并且因为 dtype、等级(维数)和连续属性都是类型的一部分,所以不能将具有不同维数的数组分配给同一个变量.

请注意,您可以扩展尺寸以使其工作并再次从结果中压缩不必要的尺寸(可能不是很好,但它应该以最少的更改"解决问题:

from numba import njit将 numpy 导入为 np@njitdef func(arr, 条件):如果条件:a = np.vstack((arr, arr))别的:a = np.expand_dims(arr, 0)返回一个>>>功能(np.zeros(5),假)array([[0., 0., 0., 0., 0.]]) # <-- 二维数组!>>>功能(np.zeros(5),真)数组([[0., 0., 0., 0., 0.],[0., 0., 0., 0., 0.]])

So I have been trying to optimize some code that calculates a statistical error metric from some array data. The metric is called the Continuous Rank Probability Score (CRPS).

I have been using Numba to try to speed up the double for loop required in this calculation, but I have been having an issue with the numpy.vstack function. From what I understand from the docs here, the vstack() function should be supported, but when I run the following code I get an error.

def crps_hersbach_numba(obs, fcst_ens, remove_neg=False, remove_zero=False):
    """Calculate the the continuous ranked probability score (CRPS) as per equation 25-27 in
    Hersbach et al. (2000)

    Parameters
    ----------
    obs: 1D ndarry
        Array of observations for each start date
    fcst_ens: 2D ndarray
        Array of ensemble forecast of dimension n x M, where n = number of start dates and
        M = number of ensemble members.

    remove_neg: bool
        If True, when a negative value is found at the i-th position in the observed OR ensemble
        array, the i-th value of the observed AND ensemble array are removed before the
        computation.

    remove_zero: bool
        If true, when a zero value is found at the i-th position in the observed OR ensemble
        array, the i-th value of the observed AND ensemble array are removed before the
        computation.

    Returns
    -------
    dict
        Dictionary contains a number of *experimental* outputs including:
            - ["crps"] 1D ndarray of crps values per n start dates.
            - ["crpsMean1"] arithmetic mean of crps values.
            - ["crpsMean2"] mean crps using eqn. 28 in Hersbach (2000).

    Notes
    -----
    **NaN and inf treatment:** If any value in obs or fcst_ens is NaN or inf, then the
    corresponding row in both fcst_ens (for all ensemble members) and in obs will be deleted.

    References
    ----------
    - Hersbach, H. (2000) Decomposition of the Continuous Ranked Porbability Score
      for Ensemble Prediction Systems, Weather and Forecasting, 15, 559-570.
    """
    # Treating the Data
    obs, fcst_ens = treat_data(obs, fcst_ens, remove_neg=remove_neg, remove_zero=remove_zero)

    # Set parameters
    n = fcst_ens.shape[0]  # number of forecast start dates
    m = fcst_ens.shape[1]  # number of ensemble members

    # Create vector of pi's
    p = np.linspace(0, m, m + 1)
    pi = p / m

    crps_numba = np.zeros(n)

    @njit
    def calculate_crps():
        # Loop fcst start times
        for i in prange(n):

            # Initialise vectors for storing output
            a = np.zeros(m - 1)
            b = np.zeros(m - 1)

            # Verifying analysis (or obs)
            xa = obs[i]

            # Ensemble fcst CDF
            x = np.sort(fcst_ens[i, :])

            # Deal with 0 < i < m [So, will loop 50 times for m = 51]
            for j in prange(m - 1):

                # Rule 1
                if xa > x[j + 1]:
                    a[j] = x[j + 1] - x[j]
                    b[j] = 0

                # Rule 2
                if x[j] < xa < x[j + 1]:
                    a[j] = xa - x[j]
                    b[j] = x[j + 1] - xa

                # Rule 3
                if xa < x[j]:
                    a[j] = 0
                    b[j] = x[j + 1] - x[j]

            # Deal with outliers for i = 0, and i = m,
            # else a & b are 0 for non-outliers
            if xa < x[0]:
                a1 = 0
                b1 = x[0] - xa
            else:
                a1 = 0
                b1 = 0

            # Upper outlier (rem m-1 is for last member m, but python is 0-based indexing)
            if xa > x[m - 1]:
                am = xa - x[m - 1]
                bm = 0
            else:
                am = 0
                bm = 0

            # Combine full a & b vectors including outlier
            a = np.concatenate((np.array([0]), a, np.array([am])))
            # a = np.insert(a, 0, a1)
            # a = np.append(a, am)
            a = np.concatenate((np.array([0]), a, np.array([bm])))
            # b = np.insert(b, 0, b1)
            # b = np.append(b, bm)

            # Populate a_mat and b_mat
            if i == 0:
                a_mat = a
                b_mat = b
            else:
                a_mat = np.vstack((a_mat, a))
                b_mat = np.vstack((b_mat, b))

            # Calc crps for individual start times
            crps_numba[i] = ((a * pi ** 2) + (b * (1 - pi) ** 2)).sum()

        return crps_numba, a_mat, b_mat

    crps, a_mat, b_mat = calculate_crps()
    print(crps)
    # Calc mean crps as simple mean across crps[i]
    crps_mean_method1 = np.mean(crps)

    # Calc mean crps across all start times from eqn. 28 in Hersbach (2000)
    abar = np.mean(a_mat, 0)
    bbar = np.mean(b_mat, 0)
    crps_mean_method2 = ((abar * pi ** 2) + (bbar * (1 - pi) ** 2)).sum()

    # Output array as a dictionary
    output = {'crps': crps, 'crpsMean1': crps_mean_method1,
              'crpsMean2': crps_mean_method2}

    return output

The error I get is this:

Cannot unify array(float64, 1d, C) and array(float64, 2d, C) for 'a_mat', defined at *path

File "test.py", line 86:
    def calculate_crps():
        <source elided>
            if i == 0:
                a_mat = a
                ^

[1] During: typing of assignment at *path

File "test.py", line 89:
    def calculate_crps():
        <source elided>
            else:
                a_mat = np.vstack((a_mat, a))
                ^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

I just wanted to know where I am going wrong. It seems as though the vstack function should work but maybe I am missing something.

解决方案

I just wanted to know where I am going wrong. It seems as though the vstack function should work but maybe I am missing something.

TL;DR: It's not vstack that is the problem. The problem is that you have code-paths that try to assign different types of arrays to the same variable (which throws that unification exception).

The problem lies here:

# Populate a_mat and b_mat
if i == 0:
    a_mat = a
    b_mat = b
else:
    a_mat = np.vstack((a_mat, a))
    b_mat = np.vstack((b_mat, b))

In the first code-path you assign a 1d c-contigous float64 array to a_mat and b_mat and in the else it's a 2d c-contiguous float64 array. These types are not compatible and thus numba throws an error. It's sometimes tricky that numba code doesn't work like Python code, where it doesn't matter what types you have when you assign something to a variable. However in recent releases the numba exception messages got a lot better so if you know what the exception hints at you can normally quickly find out what's the problem.

Longer Explanation

The problem is that numba infers the types of your variables implicitly. For example:

from numba import njit

@njit
def func(arr):
    a = arr
    return a

Here I haven't typed the function so I need to run it once:

>>> import numpy as np
>>> func(np.zeros(5))
array([0., 0., 0., 0., 0.])

Then you can inspect the types:

>>> func.inspect_types()

func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-4-02470248b065>
# --- LINE 3 --- 
# label 0

@njit

# --- LINE 4 --- 

def func(arr):

    # --- LINE 5 --- 
    #   arr = arg(0, name=arr)  :: array(float64, 1d, C)
    #   a = arr  :: array(float64, 1d, C)
    #   del arr

    a = arr

    # --- LINE 6 --- 
    #   $0.3 = cast(value=a)  :: array(float64, 1d, C)
    #   del a
    #   return $0.3

    return a

As you can see the variable a is typed for an input of type array(float64, 1d, C) as array(float64, 1d, C).

Now, let's use np.vstack instead:

from numba import njit
import numpy as np

@njit
def func(arr):
    a = np.vstack((arr, arr))
    return a

And the mandatory first call to compile it:

>>> func(np.zeros(5))
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Then inspect the types again:

func (array(float64, 1d, C),)
--------------------------------------------------------------------------------
# File: <ipython-input-11-f0214d5181c6>
# --- LINE 4 --- 
# label 0

@njit

# --- LINE 5 --- 

def func(arr):

    # --- LINE 6 --- 
    #   arr = arg(0, name=arr)  :: array(float64, 1d, C)
    #   $0.1 = global(np: <module 'numpy'>)  :: Module(<module 'numpy'>)
    #   $0.2 = getattr(value=$0.1, attr=vstack)  :: Function(<function vstack at 0x000001DB7082A400>)
    #   del $0.1
    #   $0.5 = build_tuple(items=[Var(arr, <ipython-input-11-f0214d5181c6> (6)), Var(arr, <ipython-input-11-f0214d5181c6> (6))])  :: tuple(array(float64, 1d, C) x 2)
    #   del arr
    #   $0.6 = call $0.2($0.5, func=$0.2, args=[Var($0.5, <ipython-input-11-f0214d5181c6> (6))], kws=(), vararg=None)  :: (tuple(array(float64, 1d, C) x 2),) -> array(float64, 2d, C)
    #   del $0.5
    #   del $0.2
    #   a = $0.6  :: array(float64, 2d, C)
    #   del $0.6

    a = np.vstack((arr, arr))

    # --- LINE 7 --- 
    #   $0.8 = cast(value=a)  :: array(float64, 2d, C)
    #   del a
    #   return $0.8

    return a

This time a is typed as array(float64, 2d, C) for an input of array(float64, 1d, C).

You may have asked yourself why I'm talking about that. Let's see what happens if you try to conditionally assign to a:

from numba import njit
import numpy as np

@njit
def func(arr, condition):
    if condition:
        a = np.vstack((arr, arr))
    else:
        a = arr
    return a

If you now run the code:

>>> func(np.zeros(5), True)

TypingError: Failed at nopython (nopython frontend)
Cannot unify array(float64, 2d, C) and array(float64, 1d, C) for 'a', defined at <ipython-input-16-f4bd9a4f377a> (7)

File "<ipython-input-16-f4bd9a4f377a>", line 7:
def func(arr, condition):
    <source elided>
    if condition:
        a = np.vstack((arr, arr))
        ^

[1] During: typing of assignment at <ipython-input-16-f4bd9a4f377a> (9)

File "<ipython-input-16-f4bd9a4f377a>", line 9:
def func(arr, condition):
    <source elided>
    else:
        a = arr
        ^

That's exactly the problem you have and it's because variables need to have one and only one type in numba for a fixed set of input types. And because the dtype, the rank (number of dimensions), and the contiguous property are all part of the type you cannot assign arrays with different dimensions to the same variable.

Note that you could expand the dimensions to make it works and squeeze the unnecessary dimensions again from the result (probably not very nice but it should solve the problem with minimal amount of "changes":

from numba import njit
import numpy as np

@njit
def func(arr, condition):
    if condition:
        a = np.vstack((arr, arr))
    else:
        a = np.expand_dims(arr, 0)
    return a

>>> func(np.zeros(5), False)
array([[0., 0., 0., 0., 0.]])  # <-- 2d array!
>>> func(np.zeros(5), True)
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

这篇关于在 numba 中使用 numpy.vstack的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆