数据框 pandas 的Groupby值计数 [英] Groupby value counts on the dataframe pandas

查看：112 发布时间：2020/5/23 21:17:44 python pandas dataframe crosstab pandas-groupby

本文介绍了数据框 pandas 的Groupby值计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框:

df = pd.DataFrame([
    (1, 1, 'term1'),
    (1, 2, 'term2'),
    (1, 1, 'term1'),
    (1, 1, 'term2'),
    (2, 2, 'term3'),
    (2, 3, 'term1'),
    (2, 2, 'term1')
], columns=['id', 'group', 'term'])

我想按id和group对其进行分组，并计算该ID(分组对)的每个术语的数量.

I want to group it by id and group and calculate the number of each term for this id, group pair.

所以最终我会得到这样的东西:

So in the end I am going to get something like this:

我可以通过用df.iterrows()遍历所有行并创建一个新的数据框来实现所需的功能，但这显然效率不高. (如果有帮助，我会事先知道所有术语的列表，其中约有10个.)

I was able to achieve what I want by looping over all the rows with df.iterrows() and creating a new dataframe, but this is clearly inefficient. (If it helps, I know the list of all terms beforehand and there are ~10 of them).

看来我必须分组然后计算值，所以我尝试了使用df.groupby(['id', 'group']).value_counts()的方法，该方法不起作用，因为

It looks like I have to group by and then count values, so I tried that with df.groupby(['id', 'group']).value_counts() which does not work because value_counts operates on the groupby series and not a dataframe.

无论如何我都可以不循环而实现?

Anyway I can achieve this without looping?

推荐答案

我使用groupby和size

df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0)

1,000,000行

1,000,000 rows

df = pd.DataFrame(dict(id=np.random.choice(100, 1000000),
                       group=np.random.choice(20, 1000000),
                       term=np.random.choice(10, 1000000)))

这篇关于数据框 pandas 的Groupby值计数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据框 pandas 的Groupby值计数 [英] Groupby value counts on the dataframe pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

数据框 pandas 的Groupby值计数 [英] Groupby value counts on the dataframe pandas

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭