[ pandas ]如何在每个组中获取前n%个记录 [英] [Pandas]how to get top-n% records within each group
本文介绍了[ pandas ]如何在每个组中获取前n%个记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是我的dataFrame
This is my dataFrame
df = pd.DataFrame([['@1','A',40],['@2','A',60],['@3','A',47],['@4','B',33],['@5','B',69],['@6','B',22],['@7','B',90],['@8
','C',31],['@9','C',78],['@10','C',12],['@11','C',89],['@12','C',88],['@13','C',99]],columns=['id','channel','score'])
id channel score
0 @1 A 40
1 @2 A 60
2 @3 A 47
3 @4 B 33
4 @5 B 69
5 @6 B 22
6 @7 B 90
7 @8 C 31
8 @9 C 78
9 @10 C 12
10 @11 C 89
11 @12 C 88
12 @13 C 99
每个渠道都有自己的总数,我将百分比设置为80%
Each channel has its own total number , I set a percent number = 80%
我想将int(channel'num * 0.8)设为最大,所以它将是
and I want to take int(channel'num * 0.8) nlargest , so it's will be
A channel take int(3*0.8) = 2
B channel take int(4*0.8) = 3
C channel take int(6*0.8) = 4
id channel score
1 @2 A 60
2 @3 A 47
3 @4 B 33
4 @5 B 69
6 @7 B 90
8 @9 C 78
10 @11 C 89
11 @12 C 88
12 @13 C 99
我该怎么办,谢谢.
推荐答案
a = 0.8
df1 = (df.groupby('channel',group_keys=False)
.apply(lambda x: x.nlargest(int(len(x) * a), 'score')))
print (df1)
id channel score
1 @2 A 60
2 @3 A 47
6 @7 B 90
4 @5 B 69
3 @4 B 33
12 @13 C 99
10 @11 C 89
11 @12 C 88
8 @9 C 78
使用 sort_values
的另一种解决方案+ groupby
+ head
:
df1 = (df.sort_values('score', ascending=False)
.groupby('channel',group_keys=False)
.apply(lambda x: x.head(int(len(x) * a)))
.reset_index(drop=True))
print (df1)
id channel score
0 @2 A 60
1 @3 A 47
2 @7 B 90
3 @5 B 69
4 @4 B 33
5 @13 C 99
6 @11 C 89
7 @12 C 88
8 @9 C 78
这篇关于[ pandas ]如何在每个组中获取前n%个记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文