对 pandas 数据框中的每一行进行排序的最快方法 [英] Fastest way to sort each row in a pandas dataframe
问题描述
我需要找到最快的方法来对具有数百万行和约一百列的数据框中的每一行进行排序.
I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.
是这样的:
A B C D
3 4 8 1
9 2 7 2
需要成为:
A B C D
8 4 3 1
9 7 2 2
现在,我将排序应用于每一行,并逐行建立一个新的数据框.我还在每行中做一些额外的,不太重要的事情(因此为什么我使用熊猫而不是numpy).改为创建列表列表,然后立即构建新的数据框,会更快吗?还是我需要去cython?
Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?
推荐答案
我想我会在numpy中做到这一点:
I think I would do this in numpy:
In [11]: a = df.values
In [12]: a.sort(axis=1) # no ascending argument
In [13]: a = a[:, ::-1] # so reverse
In [14]: a
Out[14]:
array([[8, 4, 3, 1],
[9, 7, 2, 2]])
In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
A B C D
0 8 4 3 1
1 9 7 2 2
我曾经认为这可能有效,但是它对列进行了排序:
I had thought this might work, but it sorts the columns:
In [21]: df.sort(axis=1, ascending=False)
Out[21]:
D C B A
0 1 8 4 3
1 2 7 2 9
啊,熊猫加薪了
In [22]: df.sort(df.columns, axis=1, ascending=False)
ValueError:按列排序时,轴必须为0(行)
ValueError: When sorting by column, axis must be 0 (rows)
这篇关于对 pandas 数据框中的每一行进行排序的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!