pandas 数据框中选定列和计数中值的唯一组合 [英] unique combinations of values in selected columns in pandas data frame and count
问题描述
我在pandas数据框中的数据如下:
I have my data in pandas data frame as follows:
df1 = pd.DataFrame({'A':['yes','yes','yes','yes','no','no','yes','yes','yes','no'],
'B':['yes','no','no','no','yes','yes','no','yes','yes','no']})
所以,我的数据看起来像这样
So, my data looks like this
----------------------------
index A B
0 yes yes
1 yes no
2 yes no
3 yes no
4 no yes
5 no yes
6 yes no
7 yes yes
8 yes yes
9 no no
-----------------------------
我想将其转换为另一个数据框.预期的输出可以在以下python脚本中显示:
I would like to transform it to another data frame. The expected output can be shown in the following python script:
output = pd.DataFrame({'A':['no','no','yes','yes'],'B':['no','yes','no','yes'],'count':[1,2,4,3]})
所以,我的预期输出看起来像这样
So, my expected output looks like this
--------------------------------------------
index A B count
--------------------------------------------
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
--------------------------------------------
实际上,我可以使用以下命令找到所有组合并对其进行计数:mytable = df1.groupby(['A','B']).size()
Actually, I can achieve to find all combinations and count them by using the following command: mytable = df1.groupby(['A','B']).size()
但是,事实证明,这样的组合在单个列中.我想将组合中的每个值分隔到不同的列中,并且还要为计数结果添加一个以上的列.有可能这样做吗?请问您有什么建议吗?预先谢谢你.
However, it turns out that such combinations are in a single column. I would like to separate each value in a combination into different column and also add one more column for the result of counting. Is it possible to do that? May I have your suggestions? Thank you in advance.
推荐答案
您可以在列'A'和'B'上groupby
并调用size
,然后调用reset_index
和rename
生成的列:
You can groupby
on cols 'A' and 'B' and call size
and then reset_index
and rename
the generated column:
In [26]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
更新
一些解释,通过将2列分组,将A和B值相同的行分组,我们调用size
,它返回唯一组的数量:
A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size
which returns the number of unique groups:
In[202]:
df1.groupby(['A','B']).size()
Out[202]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
所以现在要恢复分组的列,我们调用reset_index
:
So now to restore the grouped columns, we call reset_index
:
In[203]:
df1.groupby(['A','B']).size().reset_index()
Out[203]:
A B 0
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
这将还原索引,但大小聚合将变成生成的列0
,因此我们必须重命名该名称:
This restores the indices but the size aggregation is turned into a generated column 0
, so we have to rename this:
In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[204]:
A B count
0 no no 1
1 no yes 2
2 yes no 4
3 yes yes 3
groupby
确实接受我们可以将其设置为False
的arg as_index
,因此它不会使分组列成为索引,但这会生成series
,并且您仍然必须还原索引等等...
groupby
does accept the arg as_index
which we could have set to False
so it doesn't make the grouped columns the index, but this generates a series
and you'd still have to restore the indices and so on....:
In[205]:
df1.groupby(['A','B'], as_index=False).size()
Out[205]:
A B
no no 1
yes 2
yes no 4
yes 3
dtype: int64
这篇关于 pandas 数据框中选定列和计数中值的唯一组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!