使用numpy进行数组重分类 [英] Array reclassification with numpy

查看:103
本文介绍了使用numpy进行数组重分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型(50000 x 50000)64位整数NumPy数组,其中包含10位数字.数组中大约有25万个唯一数字.

I have a large (50000 x 50000) 64-bit integer NumPy array containing 10-digit numbers. There are about 250,000 unique numbers in the array.

我有第二个重新分类表,该表将第一个数组中的每个唯一值映射到1到100之间的整数.我希望将第一个数组中的值重新分类为第二个数组中的对应值.

I have a second reclassification table which maps each unique value from the first array to an integer between 1 and 100. My hope would be to reclassify the values from the first array to the corresponding values in the second.

我尝试了两种方法来执行此操作,尽管它们起作用,但速度很慢.在这两种方法中,我都会创建一个尺寸相同的空白(零)数组.

I've tried two methods of doing this, and while they work, they are quite slow. In both methods I create a blank (zeros) array of the same dimensions.

new_array = np.zeros(old_array.shape)

第一种方法:

for old_value, new_value in lookup_array:
    new_array[old_array == old_value] = new_value

第二种方法,其中lookup_array在熊猫数据框中,标题为旧"和新:"

Second method, where lookup_array is in a pandas dataframe with the headings "Old" and "New:

for new_value, old_values in lookup_table.groupby("New"):
    new_array[np.in1d(old_array, old_values)] = new_value

有没有更快的方法来重新分类值

Is there a faster way to reclassify values

推荐答案

将查找表存储为250,000个元素数组,其中每个索引都有映射值.例如,如果您有类似的东西:

Store the lookup table as a 250,000 element array where for each index you have the mapped value. For example, if you have something like:

lookups = [(old_value_1, new_value_1), (old_value_2, new_value_2), ...]

然后您可以执行以下操作:

Then you can do:

idx, val = np.asarray(lookups).T
lookup_array = np.zeros(idx.max() + 1)
lookup_array[idx] = val

获得该结果后,您可以简单地获得转换后的数组:

When you get that, you can get your transformed array simply as:

new_array = lookup_array[old_array]

这篇关于使用numpy进行数组重分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆