使用numpy进行数组重分类 [英] Array reclassification with numpy
问题描述
我有一个大型(50000 x 50000)64位整数NumPy数组,其中包含10位数字.数组中大约有25万个唯一数字.
I have a large (50000 x 50000) 64-bit integer NumPy array containing 10-digit numbers. There are about 250,000 unique numbers in the array.
我有第二个重新分类表,该表将第一个数组中的每个唯一值映射到1到100之间的整数.我希望将第一个数组中的值重新分类为第二个数组中的对应值.
I have a second reclassification table which maps each unique value from the first array to an integer between 1 and 100. My hope would be to reclassify the values from the first array to the corresponding values in the second.
我尝试了两种方法来执行此操作,尽管它们起作用,但速度很慢.在这两种方法中,我都会创建一个尺寸相同的空白(零)数组.
I've tried two methods of doing this, and while they work, they are quite slow. In both methods I create a blank (zeros) array of the same dimensions.
new_array = np.zeros(old_array.shape)
第一种方法:
for old_value, new_value in lookup_array:
new_array[old_array == old_value] = new_value
第二种方法,其中lookup_array在熊猫数据框中,标题为旧"和新:"
Second method, where lookup_array is in a pandas dataframe with the headings "Old" and "New:
for new_value, old_values in lookup_table.groupby("New"):
new_array[np.in1d(old_array, old_values)] = new_value
有没有更快的方法来重新分类值
Is there a faster way to reclassify values
推荐答案
将查找表存储为250,000个元素数组,其中每个索引都有映射值.例如,如果您有类似的东西:
Store the lookup table as a 250,000 element array where for each index you have the mapped value. For example, if you have something like:
lookups = [(old_value_1, new_value_1), (old_value_2, new_value_2), ...]
然后您可以执行以下操作:
Then you can do:
idx, val = np.asarray(lookups).T
lookup_array = np.zeros(idx.max() + 1)
lookup_array[idx] = val
获得该结果后,您可以简单地获得转换后的数组:
When you get that, you can get your transformed array simply as:
new_array = lookup_array[old_array]
这篇关于使用numpy进行数组重分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!