快速排序多维数组 [英] Fast sort multidimensional array

查看:99
本文介绍了快速排序多维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要根据第一个子数组中的值对多维数组进行排序,并且要尽快(将行应用数百万次)。

I need to sort a multidimensional array according to the values in the first sub-array, as fast as possible (the line is applied millions of times).

下面是我的原始内容,也是我尝试改善其性能的尝试,但没有成功。据我所知,我的 numpy 方法仅对第一个子数组进行了正确排序,而对其余所有子数组均未进行排序。

Below is my original line, and my attempt at improving its performance which is not working. As far as I can see, my numpy approach is only sorting properly the first sub-array, and none of the remaining ones.

我在做什么错了,如何提高排序的性能?

What am I doing wrong and how can I improve the performance of the sorting?

import numpy as np

# Generate some random data.
# I receive the actual data as a list, hence the .tolist()
aa = np.random.rand(10, 2000).tolist()

# This is the original line I need to process faster.
b1 = zip(*sorted(zip(*aa), key=lambda x: x[0]))

# This is my attempt at improving the above line's performance
b2 = np.sort(np.asarray(aa).T, axis=0).T

# Check if all sub-arrays are equal
for a, b in zip(*[b1, b2]):
    print(np.array_equal(a, b))


推荐答案

关于 lambdas 还是一个新手,但是从您的代码中我了解得很少-似乎在您的 lambda 方法,您使用的是 x [0] 来获取排序键,然后使用这些键从值中提取每个元素 aa 。用NumPy术语来说,这意味着获取数组版本中第一行的排序索引,然后索引到每一行中(因为 aa 的每个元素都成为数组 a )。这基本上是列索引。另外,似乎 sorted 保持相同元素的顺序。因此,我们需要使用 argsort(kind ='mergesort')

Still a novice when it comes to lambdas, but from what little I understand from your code - It seems in your lambda method, you are using x[0] to get the sort keys and then using those to pull values off each element in aa. In NumPy terms, that translates to getting the sort indices for the first row in the array version and then indexing into each row (since each element of aa becomes each row of array a). That's basically column-indexing. Also, it seems sorted maintains order for identical elements. So, we need to use argsort(kind='mergesort').

因此,我们可以简单地-

Thus, we can simply do -

a[:, a[0].argsort(kind='mergesort')] # a = np.array(aa) 

在您的NumPy代码中,您没有执行任何此类操作,因此没有给出正确的结果。

In your NumPy code, you are doing nothing of those sorts, so not giving the correct results.

这篇关于快速排序多维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆