如何使用“命名聚合" [英] How to use "Named aggregation"
问题描述
我想在 Pandas DataFrameGroupBy 的同一列上应用两个不同的聚合,并命名新列.
我已经尝试使用文档中显示的内容.https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#named-aggregation
<块引用>在[82]中:animals.groupby("kind").agg(....: min_height=('height', 'min'),....: max_height=('height', 'max'),....: average_weight=('weight', np.mean),....:)....:出[82]:min_height max_height average_weight种类猫 9.1 9.5 8.90狗 6.0 34.0 102.75
我想做的事情是:
df = pd.DataFrame({"year": [2001, 2001, 2001, 2005, 2005],"值": [1, 2, 5, 3, 1]})df = df.groupby("year").agg(sum=('value', 'sum'),计数=('值','大小'))
然而,这给出了以下内容:
TypeError:aggregate() 缺少 1 个必需的位置参数:'arg'
由于您需要为一列使用两个 aggfunction ,因此您可能需要传递给列表,就像您未将 pandas
更新为 0.25 一样.0
df = df.groupby("year").value.agg(['sum','count'])df总和数年2001 8 32005 4 2
在pandas
0.25.0
pd.__version__'0.25.0'df = df.groupby("year").agg(sum=('value', 'sum'),计数=('值','计数'))df总和数年2001 8 32005 4 2
I want to apply two different aggregates on the same column in a pandas DataFrameGroupBy and have the new columns be named.
I've tried using what is shown here in the documentation. https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#named-aggregation
In [82]: animals.groupby("kind").agg( ....: min_height=('height', 'min'), ....: max_height=('height', 'max'), ....: average_weight=('weight', np.mean), ....: ) ....: Out[82]: min_height max_height average_weight kind cat 9.1 9.5 8.90 dog 6.0 34.0 102.75
Something like what I'm trying to do is:
df = pd.DataFrame({"year": [2001, 2001, 2001, 2005, 2005],
"value": [1, 2, 5, 3, 1]})
df = df.groupby("year").agg(sum=('value', 'sum'),
count=('value', 'size'))
However, this gives the following:
TypeError: aggregate() missing 1 required positional argument: 'arg'
Since you need two aggfunction for one columns , you may need to pass to list like when you are not update your pandas
to 0.25.0
df = df.groupby("year").value.agg(['sum','count'])
df
sum count
year
2001 8 3
2005 4 2
In pandas
0.25.0
pd.__version__
'0.25.0'
df = df.groupby("year").agg(sum=('value', 'sum'),
count=('value', 'count'))
df
sum count
year
2001 8 3
2005 4 2
这篇关于如何使用“命名聚合"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!