"as_index = False"和"reset_index()"之间的区别在于:在 pandas groupby [英] Difference between "as_index = False", and "reset_index()" in pandas groupby
问题描述
我只是想知道这2个设备执行的功能有什么区别.
I just wanted to know what is the difference in the function performed by these 2.
数据:
import pandas as pd
df = pd.DataFrame({"ID":["A","B","A","C","A","A","C","B"], "value":[1,2,4,3,6,7,3,4]})
as_index = False:
as_index=False :
df_group1 = df.groupby("ID").sum().reset_index()
reset_index():
reset_index() :
df_group2 = df.groupby("ID", as_index=False).sum()
它们两个都给出完全相同的输出.
Both of them give the exact same output.
ID value
0 A 18
1 B 6
2 C 6
有人可以告诉我有什么区别吗?有任何例子可以说明吗?
Can anyone tell me what is the difference and any example illustrating the same?
推荐答案
使用as_index=False
时,您向groupby()
表示不想将列ID设置为索引(duh!).当两种实现产生相同的结果时,请使用as_index=False
,因为它将节省您一些键入操作和不必要的pandas操作;)
When you use as_index=False
, you indicate to groupby()
that you don't want to set the column ID as the index (duh!). When both implementation yield the same results, use as_index=False
because it will save you some typing and an unnecessary pandas operation ;)
但是,有时您希望对组应用更复杂的操作.在这种情况下,您可能会发现一个比另一个更合适.
However, sometimes, you want to apply more complicated operations on your groups. In those occasions, you might find out that one is more suited than the other.
示例1: 您想对两个轴上一组中三个变量(即列)的值求和.
Example 1: You want to sum the values of three variables (i.e. columns) in a group on both axes.
使用as_index=True
可以在不指定列名称的情况下在axis=1
上应用求和,然后对轴0上的值求和.操作完成后,可以使用reset_index(drop=True/False)
在以下位置获取数据帧正确的表格.
Using as_index=True
allows you to apply a sum over axis=1
without specifying the names of the columns, then summing the value over axis 0. When the operation is finished, you can use reset_index(drop=True/False)
to get the dataframe under the right form.
示例2: .您需要根据groupby()
中的列为该组设置一个值.
Example 2: You need to set a value for the group based on the columns in the groupby()
.
设置as_index=False
允许您检查公共列而不是索引的条件,这通常更容易.
Setting as_index=False
allow you to check the condition on a common column and not on an index, which is often way easier.
在某些时候,对组应用操作时,您可能会遇到KeyError
.在这种情况下,通常是因为您试图在聚合函数中使用一列,该列当前是GroupBy对象的索引.
At some point, you might come across KeyError
when applying operations on groups. In that case, it is often because you are trying to use a column in your aggregate function that is currently an index of your GroupBy object.
这篇关于"as_index = False"和"reset_index()"之间的区别在于:在 pandas groupby的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!