如何使用 Numpy 有效地创建条件列数组? [英] How to effeciently create conditional columns arrays using Numpy?

查看:66
本文介绍了如何使用 Numpy 有效地创建条件列数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标是创建一个数组,但要满足(x=>y) and (y=>z)的条件.

The objective is to create an array but by fulfilling the condition of (x=>y) and (y=>z).

一种天真的方法是使用嵌套的for循环,如下所示

One naive way but does the job is by using a nested for loop as shown below

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)

a=np.zeros(shape=(1,3))
for x in list_no:
    for y in list_no:
        for z in list_no:
            if (x>=y) & (y>=z):
                a=np.append(a, [[x, y, z]], axis=0)

虽然没有抛出内存需求问题,但执行时间明显变慢.

While no memory requirement issue was thrown, but the execution time is significantly slow.

可以考虑的其他方法是使用下面的代码代码.然而,只要tot_length 小于100,该提案才能完美运行.更重要的是,内存问题出现在此处

Other approach that can be considered is by using the code code below. Yet the proposal only able to work flawlessly as long as tot_length is less than 100. More than that, memory issue arise as reported here

tot_length=200
steps=0.1
start_val=0.0
list_no =np.arange(start_val, tot_length, steps)
arr = np.meshgrid ( *[list_no for _ in range ( 3 )] )
a = np.array(list ( map ( np.ravel, arr ) )).transpose()
num_rows, num_cols = a.shape

a_list = np.arange ( num_cols ).reshape ( (-1, 3) )
for x in range ( len ( a_list ) ):
    a=a[(a[:, a_list [x, 0]] >= a[:, a_list [x, 1]]) & (a[:, a_list [x, 1]] >= a[:, a_list [x, 2]])]

感谢任何可以平衡整体执行时间和内存问题的建议.我也欢迎任何使用 Pandas 的建议,如果这能让事情顺利的话

Appreciate for any suggestion that can balance the overall execution time as well as memory issue. I also welcome for any suggestion using Pandas if that should make thing work

为了确定提议的输出是否产生了预期的输出,以下参数

To determine whether the proposed output produced the intended output, the following parameter

tot_length=3
steps=1
start_val=1

应该产生输出

1   1   1
2   1   1
2   2   1
2   2   2

推荐答案

这样的事情有用吗?

tot_length=200
steps=0.1
list_no = np.arange(0.0, tot_length, steps)
x, y, z = np.meshgrid(*[list_no for _ in range(3)], sparse=True)
a = ((x>=y) & (y>=z)).nonzero()

这仍然会为中间布尔数组使用 8GB 的​​内存,但避免重复调用 np.append,因为它们很慢.

This will still use 8GB of memory for the intermediate array of booleans, but avoids repeated calls to np.append which are slow.

这篇关于如何使用 Numpy 有效地创建条件列数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆