使用布尔数组的不同组合作为键的 pandas 分组结果 [英] pandas groupby result using different combinations of boolean array as keys
问题描述
我试图通过使用布尔数组作为关键字来理解组,这是测试代码,
I tried to understand the groupby using boolean array as key, here is the test code,
a = pd.DataFrame([[True,False,False],[False,True,False]], columns=['A','B','C'])
print(a)
A B C
0 True False False
1 False True False
然后我尝试了布尔数组的不同组合,看来groupby的结果都是一样的r
Then I tried different combination of boolean array, which seems that the groupby result are all the same r
b=a.groupby([False,False])
b.apply(pd.DataFrame)
A B C
0 True False False
1 False True False
c=a.groupby([True,False])
c.apply(pd.DataFrame)
A B C
0 True False False
1 False True False
d=a.groupby([False,True])
d.apply(pd.DataFrame)
A B C
0 True False False
1 False True False
e=a.groupby([False,True])
e.apply(pd.DataFrame)
A B C
0 True False False
1 False True False
推荐答案
让我们分解一下
.groubpy().apply(pd.DataFrame)
正如您在所有变体中使用的那样,从每个组中获取行并创建一个数据帧,它基本上返回 self
,所以输出看起来是一样的,但熊猫到达那里的方式在每种情况下都不同
.groubpy().apply(pd.DataFrame)
as you use in all variants takes the rows from each group and creates a dataframe, which basically returns self
, so the output looks the same, but the way pandas gets there is different in every case
b=a.groupby([False,False])
:两行属于同一个组(group_idFalse
),一起解析一次形成相同的df
b=a.groupby([False,False])
: both rows belong to the same group (group_idFalse
), and are parsed together once to form the same df
c=a.groupby([True,False])
:有两组,每组一行.Apply 获取每个组并构建两个单独的 DataFrames(每组一个).然后连接并返回一个与原始相同的 df
c=a.groupby([True,False])
: there are two groups with one row each. Apply takes each group and builds two separate DataFrames (one per group). Then concatenates and returs a df identical to original
d=a.groupby([False,True])
:与#2 相同,但现在第一行属于组False
.如果您聚合或应用了不同的函数(pandas.DataFrame 除外),您会看到带有 True, False
的 df 作为索引(默认为 groupby 排序),第 1 行将显示为第一行,因为它属于 True
d=a.groupby([False,True])
: same as #2 but now the first row belongs to group False
. If you aggregated or applied a different function (other than pandas.DataFrame) you would see the df with True, False
as the index (groupby sorts by default) and row 1 would appear as the first row, because it belongs to group True
这篇关于使用布尔数组的不同组合作为键的 pandas 分组结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!