如何在 Pandas 中使用条件执行多个 groupby 和转换计数 [英] How to perform a multiple groupby and transform count with a condition in pandas
问题描述
这是这里问题的扩展:
然后我尝试为 Score 大于 1(应等于 4)的行的总和添加另一列:
df['scoregreaterthan1'] = df['Score'].gt(1).groupby(by=df[['Rating','Other']]).transform('sum')
但是我得到了一个
ValueError: Grouper for ''不是一维
有什么想法吗?非常感谢!
df['Score'].gt(1)
正在返回一个布尔系列而不是一个数据帧.您需要先返回一个数据框,然后才能按相关列进行分组.
使用:
df = df[df['Score'].gt(1)]df['scoregreaterthan1'] = df.groupby(['Rating','Other'])['Score'].transform('count')df
输出:
Name Attempts Score Category Rating Other AttemptsbyRating scoregreaterthan10 汤姆 10 2 c 100 x 6 41 汤姆 16 3 100 x 6 42 汤姆 22 2 一个 100 x 6 44 亚光 15 5 b 100 x 6 4
如果你想保留分数不大于 1 的人,那么不要这样:
df = df[df['Score'].gt(1)]df['scoregreaterthan1'] = df.groupby(['Rating','Other'])['Score'].transform('count')
这样做:
df['scoregreaterthan1'] = df[df['Score'].gt(1)].groupby(['Rating','Other'])['Score'].transform('count')df['scoregreaterthan1'] = df['scoregreaterthan1'].ffill().astype(int)
输出 2:
Name Attempts Score Category Rating Other AttemptsbyRating scoregreaterthan10 汤姆 10 2 c 100 x 6 41 鼓 16 3 100 x 6 42 汤姆 22 2 一个 100 x 6 43 亚光 10 1 c 100 x 6 44 亚光 15 5 b 100 x 6 45 亚光 14 1 b 100 x 6 4
This is an extension of the question here: here
I am trying add an extra column to the grouby:
# Import pandas library
import pandas as pd
import numpy as np
# data
data = [['tom', 10,2,'c',100,'x'], ['tom',16 ,3,'a',100,'x'], ['tom', 22,2,'a',100,'x'],
['matt', 10,1,'c',100,'x'], ['matt', 15,5,'b',100,'x'], ['matt', 14,1,'b',100,'x']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Attempts','Score','Category','Rating','Other'])
df['AttemptsbyRating'] = df.groupby(by=['Rating','Other'])['Attempts'].transform('count')
df
Then i try to add another column for the sum of rows that have a Score greater than 1 (which should equal 4):
df['scoregreaterthan1'] = df['Score'].gt(1).groupby(by=df[['Rating','Other']]).transform('sum')
But i am getting a
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
Any ideas? thanks very much!
df['Score'].gt(1)
is returning a boolean series rather than a dataframe. You need to return a dataframe first before you can groupby the relevant columns.
use:
df = df[df['Score'].gt(1)]
df['scoregreaterthan1'] = df.groupby(['Rating','Other'])['Score'].transform('count')
df
output:
Name Attempts Score Category Rating Other AttemptsbyRating scoregreaterthan1
0 tom 10 2 c 100 x 6 4
1 tom 16 3 a 100 x 6 4
2 tom 22 2 a 100 x 6 4
4 matt 15 5 b 100 x 6 4
If you want to keep the people who have a score that is not greater than one, then instead of this:
df = df[df['Score'].gt(1)]
df['scoregreaterthan1'] = df.groupby(['Rating','Other'])['Score'].transform('count')
do this:
df['scoregreaterthan1'] = df[df['Score'].gt(1)].groupby(['Rating','Other'])['Score'].transform('count')
df['scoregreaterthan1'] = df['scoregreaterthan1'].ffill().astype(int)
output 2:
Name Attempts Score Category Rating Other AttemptsbyRating scoregreaterthan1
0 tom 10 2 c 100 x 6 4
1 tom 16 3 a 100 x 6 4
2 tom 22 2 a 100 x 6 4
3 matt 10 1 c 100 x 6 4
4 matt 15 5 b 100 x 6 4
5 matt 14 1 b 100 x 6 4
这篇关于如何在 Pandas 中使用条件执行多个 groupby 和转换计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!