Pandas:根据其他多级列对最里面的列进行分组排序 [英] Pandas: Sort innermost column group-wise based on other multilevel column
本文介绍了Pandas:根据其他多级列对最里面的列进行分组排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
考虑下面的df:
In [3771]: df = pd.DataFrame({'A': ['a'] * 11,
'B': ['b'] * 11,
'C': ['C1', 'C1', 'C2','C1', 'C3', 'C3', 'C2', 'C3', 'C3', 'C2', 'C2'],
'D': ['D1', 'D2', 'D1', 'D3', 'D3', 'D2', 'D4', 'D4', 'D1', 'D2', 'D3'],
'E': [{'value': '4', 'percentage': None}, {'value': 5, 'percentage': None}, {'value': 12, 'percentage': None}, {'value': 5, 'percentage': None}, {'value': '12', 'percentage': None}, {'value': 'N/A', 'percentage': None}, {}, {'value': 19, 'percentage': None}, {'value': 12, 'percentage': None}, {'value': 11, 'percentage': None}, np.nan],
'F':[{'value': 72, 'percentage': None}, {'value': 72, 'percentage': None}, {'value': 66, 'percentage': None}, {'value': 62, 'percentage': None}, {'value': 66, 'percentage': None}, {'value': 16, 'percentage': None}, {'value': 67, 'percentage': None}, {'value': 67, 'percentage': None}, {'value': 66, 'percentage': None}, {'value': 54, 'percentage': None}, {'value': 78, 'percentage': None}]})
In [3779]: df
Out[3898]:
A B C D E F
0 a b C1 D1 {'value': '4', 'percentage': None} {'value': 72, 'percentage': None}
1 a b C1 D2 {'value': 5, 'percentage': None} {'value': 72, 'percentage': None}
2 a b C2 D1 {'value': 12, 'percentage': None} {'value': 66, 'percentage': None}
3 a b C1 D3 {'value': 5, 'percentage': None} {'value': 62, 'percentage': None}
4 a b C3 D3 {'value': '12', 'percentage': None} {'value': 66, 'percentage': None}
5 a b C3 D2 {'value': 'N/A', 'percentage': None} {'value': 16, 'percentage': None}
6 a b C2 D4 {} {'value': 67, 'percentage': None}
7 a b C3 D4 {'value': 19, 'percentage': None} {'value': 67, 'percentage': None}
8 a b C3 D1 {'value': 12, 'percentage': None} {'value': 66, 'percentage': None}
9 a b C2 D2 {'value': 11, 'percentage': None} {'value': 54, 'percentage': None}
10 a b C2 D3 NaN {'value': 78, 'percentage': None}
我旋转
上面的df:
In [3776]: x = df.pivot(['B', 'C', 'D'], 'A', ['E', 'F'])
In [3781]: x
Out[3900]:
E F
A a a
B C D
b C1 D1 {'value': '4', 'percentage': None} {'value': 72, 'percentage': None}
D2 {'value': 5, 'percentage': None} {'value': 72, 'percentage': None}
D3 {'value': 5, 'percentage': None} {'value': 62, 'percentage': None}
C2 D1 {'value': 12, 'percentage': None} {'value': 66, 'percentage': None}
D2 {'value': 11, 'percentage': None} {'value': 54, 'percentage': None}
D3 NaN {'value': 78, 'percentage': None}
D4 {} {'value': 67, 'percentage': None}
C3 D1 {'value': 12, 'percentage': None} {'value': 66, 'percentage': None}
D2 {'value': 'N/A', 'percentage': None} {'value': 16, 'percentage': None}
D3 {'value': '12', 'percentage': None} {'value': 66, 'percentage': None}
D4 {'value': 19, 'percentage': None} {'value': 67, 'percentage': None}
我想根据多级列对每组外列B
和C
的最内列进行排序,即D
索引 (E, a)
根据字典中的 value
键降序排列.
I want to sort the innermost column which is D
for each group of outer columns B
and C
based on the multi-level column with index (E, a)
in descending order based on value
key from dict.
dict
可以具有混合数据类型的 value
键.它可以是 int、str、NaN 或根本不可用.
The dict
can have value
key with mixed datatypes. It can be int, str, NaN or simply unavailable.
预期输出:
E F
A a a
B C D
b C1 D2 {'value': 5, 'percentage': None} {'value': 72, 'percentage': None}
D3 {'value': 5, 'percentage': None} {'value': 62, 'percentage': None}
D1 {'value': '4', 'percentage': None} {'value': 72, 'percentage': None}
C2 D1 {'value': 12, 'percentage': None} {'value': 66, 'percentage': None}
D2 {'value': 11, 'percentage': None} {'value': 54, 'percentage': None}
D4 {} {'value': 67, 'percentage': None}
D3 NaN {'value': 78, 'percentage': None}
C3 D4 {'value': 19, 'percentage': None} {'value': 67, 'percentage': None}
D1 {'value': 12, 'percentage': None} {'value': 66, 'percentage': None}
D3 {'value': '12', 'percentage': None} {'value': 66, 'percentage': None}
D2 {'value': 'N/A', 'percentage': None} {'value': 16, 'percentage': None}
推荐答案
Solution with helper MultiIndex column
created by Series.str.get
,然后按 DataFrame.sort_values
并最后删除辅助栏:
Solution with helper MultiIndex column
created by Series.str.get
, then sorting by DataFrame.sort_values
and last remove helper column:
x[('new', 'a')] = pd.to_numeric(x[('E','a')].str.get('value'), errors='coerce')
lvl = x.index.names[:-1]
order = 'desc'
x = (x.sort_values(lvl + [('new', 'a')],ascending=[True] * len(lvl) + [order == 'asc'])
.drop(('new', 'a'), axis=1))
print (x)
E \
A a
B C D
b C1 D2 {'value': 5, 'percentage': None}
D3 {'value': 5, 'percentage': None}
D1 {'value': '4', 'percentage': None}
C2 D1 {'value': 12, 'percentage': None}
D2 {'value': 11, 'percentage': None}
D3 NaN
D4 {}
C3 D4 {'value': 19, 'percentage': None}
D1 {'value': 12, 'percentage': None}
D3 {'value': '12', 'percentage': None}
D2 {'value': 'N/A', 'percentage': None}
F
A a
B C D
b C1 D2 {'value': 72, 'percentage': None}
D3 {'value': 62, 'percentage': None}
D1 {'value': 72, 'percentage': None}
C2 D1 {'value': 66, 'percentage': None}
D2 {'value': 54, 'percentage': None}
D3 {'value': 78, 'percentage': None}
D4 {'value': 67, 'percentage': None}
C3 D4 {'value': 67, 'percentage': None}
D1 {'value': 66, 'percentage': None}
D3 {'value': 66, 'percentage': None}
D2 {'value': 16, 'percentage': None}
这篇关于Pandas:根据其他多级列对最里面的列进行分组排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文