pandas 嵌套排序和NaN [英] Pandas nested sort and NaN
问题描述
我正在尝试了解 DataFrame.sort 的预期行为
I'm trying to understand the expected behavior of DataFrame.sort on columns with NaN values.
给出此DataFrame:
Given this DataFrame:
In [36]: df
Out[36]:
a b
0 1 9
1 2 NaN
2 NaN 5
3 1 2
4 6 5
5 8 4
6 4 5
使用一列排序将NaN放在末尾,如预期的那样:
Sorting using one column puts the NaN at the end, as expected:
In [37]: df.sort(columns="a")
Out[37]:
a b
0 1 9
3 1 2
1 2 NaN
6 4 5
4 6 5
5 8 4
2 NaN 5
但是嵌套排序的行为不符合我的预期,因此NaN未排序:
But nested sort doesn't behave as I would expect, leaving the NaN unsorted:
In [38]: df.sort(columns=["a","b"])
Out[38]:
a b
3 1 2
0 1 9
1 2 NaN
2 NaN 5
6 4 5
4 6 5
5 8 4
是否有办法确保嵌套的NaN出现在每列的末尾?
Is there a way to make sure the NaNs in nested sort will appear at the end, per column?
推荐答案
直到在Pandas中得到修复,这才是我用于满足需要的功能,其中包含原始DataFrame.sort函数的功能的子集.这仅适用于数值:
Until fixed in Pandas, this is what I'm using for sorting for my needs, with a subset of the functionality of the original DataFrame.sort function. This will work for numerical values only:
def dataframe_sort(df, columns, ascending=True):
a = np.array(df[columns])
# ascending/descending array - -1 if descending, 1 if ascending
if isinstance(ascending, bool):
ascending = len(columns) * [ascending]
ascending = map(lambda x: x and 1 or -1, ascending)
ind = np.lexsort([ascending[i] * a[:, i] for i in reversed(range(len(columns)))])
return df.iloc[[ind]]
用法示例:
In [4]: df
Out[4]:
a b c
10 1 9 7
11 NaN NaN 1
12 2 NaN 6
13 NaN 5 6
14 1 2 6
15 6 5 NaN
16 8 4 4
17 4 5 3
In [5]: dataframe_sort(df, ['a', 'c'], False)
Out[5]:
a b c
16 8 4 4
15 6 5 NaN
17 4 5 3
12 2 NaN 6
10 1 9 7
14 1 2 6
13 NaN 5 6
11 NaN NaN 1
In [6]: dataframe_sort(df, ['b', 'a'], [False, True])
Out[6]:
a b c
10 1 9 7
17 4 5 3
15 6 5 NaN
13 NaN 5 6
16 8 4 4
14 1 2 6
12 2 NaN 6
11 NaN NaN 1
这篇关于 pandas 嵌套排序和NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!