如何比快速排序更快地对整数数组进行排序? [英] How to sort an array of integers faster than quicksort?
问题描述
使用numpy的quicksort排序整数数组已成为 我算法的瓶颈.不幸的是,numpy没有 基数排序. 尽管计数排序在numpy中是单线的:
Sorting an array of integers with numpy's quicksort has become the bottleneck of my algorithm. Unfortunately, numpy does not have radix sort yet. Although counting sort would be a one-liner in numpy:
np.repeat(np.arange(1+x.max()), np.bincount(x))
请参见已接受的答案,如何对这种python计数排序向量进行矢量化处理,以使其速度尽可能的快?一个问题,整数
在我的应用程序中可以从0
运行到2**32
.
see the accepted answer to the How can I vectorize this python count sort so it is absolutely as fast as it can be? question, the integers
in my application can run from 0
to 2**32
.
我坚持使用quicksort吗?
Am I stuck with quicksort?
该帖子的主要动机是
使用itertools.groupby性能对数字进行分组
问题.
另请注意
不仅可以自己提问和回答问题,明确鼓励.
This post was primarily motivated by the
Numpy grouping using itertools.groupby performance
question.
Also note that
it is not merely OK to ask and answer your own question, it is explicitly encouraged.
推荐答案
不,您不会被quicksort所困扰.例如,您可以使用
integer_sort
来自
Boost.Sort
或u4_sort
来自 usort .对数组进行排序时:
No, you are not stuck with quicksort. You could use, for example,
integer_sort
from
Boost.Sort
or u4_sort
from usort. When sorting this array:
array(randint(0, high=1<<32, size=10**8), uint32)
我得到以下结果:
NumPy quicksort: 8.636 s 1.0 (baseline)
Boost.Sort integer_sort: 4.327 s 2.0x speedup
usort u4_sort: 2.065 s 4.2x speedup
基于这个单一的实验和使用,我不会得出结论.
usort
盲目地.我会用我的实际数据进行测试并衡量会发生什么.
您的里程 会有所不同,具体取决于您的数据和计算机.这
Boost.Sort中的integer_sort
具有丰富的调整选项集,请参见
文档.
I would not jump to conclusions based on this single experiment and use
usort
blindly. I would test with my actual data and measure what happens.
Your mileage will vary depending on your data and on your machine. The
integer_sort
in Boost.Sort has a rich set of options for tuning, see the
documentation.
以下,我描述了两种从Python调用本机C或C ++函数的方法.尽管描述很长,但是这样做还是很容易的.
Below I describe two ways to call a native C or C++ function from Python. Despite the long description, it's fairly easy to do it.
提升排序
将这些行放入spreadsort.cpp文件中:
Put these lines into the spreadsort.cpp file:
#include <cinttypes>
#include "boost/sort/spreadsort/spreadsort.hpp"
using namespace boost::sort::spreadsort;
extern "C" {
void spreadsort(std::uint32_t* begin, std::size_t len) {
integer_sort(begin, begin + len);
}
}
它基本上实例化了32位模板化的integer_sort
无符号整数; extern "C"
部分通过禁用来确保C链接
名字改头换面.
假设您正在使用gcc,并且必要的包含boost文件
在/tmp/boost_1_60_0
目录下,您可以对其进行编译:
It basically instantiates the templated integer_sort
for 32 bit
unsigned integers; the extern "C"
part ensures C linkage by disabling
name mangling.
Assuming you are using gcc and that the necessary include files of boost
are under the /tmp/boost_1_60_0
directory, you can compile it:
g++ -O3 -std=c++11 -march=native -DNDEBUG -shared -fPIC -I/tmp/boost_1_60_0 spreadsort.cpp -o spreadsort.so
要生成的关键标志是-fPIC
位置无关代码
和-shared
生成一个
共享对象
.so文件. (有关详细信息,请阅读gcc文档.)
The key flags are -fPIC
to generate
position-independet code
and -shared
to generate a
shared object
.so file. (Read the docs of gcc for further details.)
然后,包装spreadsort()
C ++函数
在Python中使用 ctypes
:
Then, you wrap the spreadsort()
C++ function
in Python using ctypes
:
from ctypes import cdll, c_size_t, c_uint32
from numpy import uint32
from numpy.ctypeslib import ndpointer
__all__ = ['integer_sort']
# In spreadsort.cpp: void spreadsort(std::uint32_t* begin, std::size_t len)
lib = cdll.LoadLibrary('./spreadsort.so')
sort = lib.spreadsort
sort.restype = None
sort.argtypes = [ndpointer(c_uint32, flags='C_CONTIGUOUS'), c_size_t]
def integer_sort(arr):
assert arr.dtype == uint32, 'Expected uint32, got {}'.format(arr.dtype)
sort(arr, arr.size)
或者,您可以使用 cffi :
from cffi import FFI
from numpy import uint32
__all__ = ['integer_sort']
ffi = FFI()
ffi.cdef('void spreadsort(uint32_t* begin, size_t len);')
C = ffi.dlopen('./spreadsort.so')
def integer_sort(arr):
assert arr.dtype == uint32, 'Expected uint32, got {}'.format(arr.dtype)
begin = ffi.cast('uint32_t*', arr.ctypes.data)
C.spreadsort(begin, arr.size)
在cdll.LoadLibrary()
和ffi.dlopen()
调用中,我假设
spreadsort.so
文件的路径是./spreadsort.so
.或者,
你可以写
At the cdll.LoadLibrary()
and ffi.dlopen()
calls I assumed that the
path to the spreadsort.so
file is ./spreadsort.so
. Alternatively,
you can write
lib = cdll.LoadLibrary('spreadsort.so')
或
C = ffi.dlopen('spreadsort.so')
如果将spreadsort.so
的路径附加到LD_LIBRARY_PATH
环境
多变的.另请参见共享库.
if you append the path to spreadsort.so
to the LD_LIBRARY_PATH
environment
variable. See also Shared Libraries.
用法.在两种情况下,您只需调用上述Python包装函数integer_sort()
用32位无符号整数的numpy数组.
Usage. In both cases you simply call the above Python wrapper function integer_sort()
with your numpy array of 32 bit unsigned integers.
usort
对于u4_sort
,您可以如下进行编译:
As for u4_sort
, you can compile it as follows:
cc -DBUILDING_u4_sort -I/usr/include -I./ -I../ -I../../ -I../../../ -I../../../../ -std=c99 -fgnu89-inline -O3 -g -fPIC -shared -march=native u4_sort.c -o u4_sort.so
在u4_sort.c
文件所在的目录中发出此命令.
(可能没有那么多hacker的方法,但是我没有弄清楚.我
只是查看了usort目录中的deps.mk文件以查找
必要的编译器标志并包含路径.)
Issue this command in the directory where the u4_sort.c
file is located.
(Probably there is a less hackish way but I failed to figure that out. I
just looked into the deps.mk file in the usort directory to find out
the necessary compiler flags and include paths.)
然后,您可以如下包装C函数:
Then, you can wrap the C function as follows:
from cffi import FFI
from numpy import uint32
__all__ = ['integer_sort']
ffi = FFI()
ffi.cdef('void u4_sort(unsigned* a, const long sz);')
C = ffi.dlopen('u4_sort.so')
def integer_sort(arr):
assert arr.dtype == uint32, 'Expected uint32, got {}'.format(arr.dtype)
begin = ffi.cast('unsigned*', arr.ctypes.data)
C.u4_sort(begin, arr.size)
在上面的代码中,我假设到u4_sort.so
的路径是
附加到LD_LIBRARY_PATH
环境变量.
In the above code, I assumed that the path to u4_sort.so
has been
appended to the LD_LIBRARY_PATH
environment variable.
用法.像以前使用Boost.Sort一样,只需使用32位无符号整数的numpy数组调用上述Python包装函数integer_sort()
.
Usage. As before with Boost.Sort, you simply call the above Python wrapper function integer_sort()
with your numpy array of 32 bit unsigned integers.
这篇关于如何比快速排序更快地对整数数组进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!