pandas 数据框中选定列和计数中值的唯一组合 [英] unique combinations of values in selected columns in pandas data frame and count

查看:50
本文介绍了 pandas 数据框中选定列和计数中值的唯一组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在pandas数据框中的数据如下:

I have my data in pandas data frame as follows:

df1 = pd.DataFrame({'A':['yes','yes','yes','yes','no','no','yes','yes','yes','no'],
                   'B':['yes','no','no','no','yes','yes','no','yes','yes','no']})

所以,我的数据看起来像这样

So, my data looks like this

----------------------------
index         A        B
0           yes      yes
1           yes       no
2           yes       no
3           yes       no
4            no      yes
5            no      yes
6           yes       no
7           yes      yes
8           yes      yes
9            no       no
-----------------------------

我想将其转换为另一个数据框.预期的输出可以在以下python脚本中显示:

I would like to transform it to another data frame. The expected output can be shown in the following python script:

output = pd.DataFrame({'A':['no','no','yes','yes'],'B':['no','yes','no','yes'],'count':[1,2,4,3]})

所以,我的预期输出看起来像这样

So, my expected output looks like this

--------------------------------------------
index      A       B       count
--------------------------------------------
0         no       no        1
1         no      yes        2
2        yes       no        4
3        yes      yes        3
--------------------------------------------

实际上,我可以使用以下命令找到所有组合并对其进行计数:mytable = df1.groupby(['A','B']).size()

Actually, I can achieve to find all combinations and count them by using the following command: mytable = df1.groupby(['A','B']).size()

但是,事实证明,这样的组合在单个列中.我想将组合中的每个值分隔到不同的列中,并且还要为计数结果添加一个以上的列.有可能这样做吗?请问您有什么建议吗?预先谢谢你.

However, it turns out that such combinations are in a single column. I would like to separate each value in a combination into different column and also add one more column for the result of counting. Is it possible to do that? May I have your suggestions? Thank you in advance.

推荐答案

您可以在列'A'和'B'上groupby并调用size,然后调用reset_indexrename生成的列:

You can groupby on cols 'A' and 'B' and call size and then reset_index and rename the generated column:

In [26]:

df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})
Out[26]:
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

更新

一些解释,通过将2列分组,将A和B值相同的行分组,我们调用size,它返回唯一组的数量:

A little explanation, by grouping on the 2 columns, this groups rows where A and B values are the same, we call size which returns the number of unique groups:

In[202]:
df1.groupby(['A','B']).size()

Out[202]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

所以现在要恢复分组的列,我们调用reset_index:

So now to restore the grouped columns, we call reset_index:

In[203]:
df1.groupby(['A','B']).size().reset_index()

Out[203]: 
     A    B  0
0   no   no  1
1   no  yes  2
2  yes   no  4
3  yes  yes  3

这将还原索引,但大小聚合将变成生成的列0,因此我们必须重命名该名称:

This restores the indices but the size aggregation is turned into a generated column 0, so we have to rename this:

In[204]:
df1.groupby(['A','B']).size().reset_index().rename(columns={0:'count'})

Out[204]: 
     A    B  count
0   no   no      1
1   no  yes      2
2  yes   no      4
3  yes  yes      3

groupby确实接受我们可以将其设置为False的arg as_index,因此它不会使分组列成为索引,但这会生成series,并且您仍然必须还原索引等等...

groupby does accept the arg as_index which we could have set to False so it doesn't make the grouped columns the index, but this generates a series and you'd still have to restore the indices and so on....:

In[205]:
df1.groupby(['A','B'], as_index=False).size()

Out[205]: 
A    B  
no   no     1
     yes    2
yes  no     4
     yes    3
dtype: int64

这篇关于 pandas 数据框中选定列和计数中值的唯一组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆