合并并总结 pandas 中的几个价值计数系列 [英] Merging and sum up several value-counts series in Pandas

查看:61
本文介绍了合并并总结 pandas 中的几个价值计数系列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通常使用 value_counts()来获取值出现的次数。但是,我现在处理大型数据库表(无法将其完全加载到RAM中)并以1个月的分数查询数据。

I usually use value_counts() to get the number of occurrences of a value. However, I deal now with large database-tables (cannot load it fully into RAM) and query the data in fractions of 1 month.

有没有办法存储 value_counts()的结果并与/合并到下一个结果中?

Is there a way to store the result of value_counts() and merge it with / add it to the next results?

我要计算编号用户操作。假定
用户活动日志的结构如下:

I want to count the number user actions. Assume the following structure of user-activity logs:

# month 1
id    userId     actionType
1     1          a
2     1          c
3     2          a
4     3          a
5     3          b

# month 2
id    userId     actionType
6     1          b
7     1          b
8     2          a
9     3          c

在这些产品上使用 value_counts()

# month 1
userId
1       2
2       1
3       2

# month 2
userId
1       2
2       1
3       1

预期输出:

# month 1+2
userId
1       4
2       2
3       3

到目前为止,我只是找到了一种使用groupby和sum的方法:

Up until now, I just have found a method using groupby and sum:

# count users actions and remember them in new column
df1['count'] = df1.groupby(['userId'], sort=False)['id'].transform('count')
# delete not necessary columns
df1 = df1[['userId', 'count']]
# delete not necessary rows
df1 = df1.drop_duplicates(subset=['userId'])

# repeat
df2['count'] = df2.groupby(['userId'], sort=False)['id'].transform('count')
df2 = df2[['userId', 'count']]
df2 = df2.drop_duplicates(subset=['userId'])

# merge and sum up
print pd.concat([df1,df2]).groupby(['userId'], sort=False).sum()

pythonic / pandas的合并方式是什么

What is the pythonic / pandas' way of merging the information of several series' (and dataframes) efficiently?

推荐答案

让我建议添加并将填充值指定为0与以前建议的答案相比,它的优点在于,当两个数据框具有一组不同的唯一键时,它将起作用。

Let me suggest "add" and specify a fill value of 0. This has an advantage over the previously suggested answer in that it will work when the two Dataframes have non-identical sets of unique keys.

# Create frames
df1= pd.DataFrame({'User_id': ['a','a','b','c','c','d'],'a':[1,1,2,3,3,5]})
df2= pd.DataFrame({'User_id': ['a','a','b','b','c','c','c'],'a' [1,1,2,2,3,3,4]})

现在添加两组values_counts()。 fill_value参数将处理将出现的所有NaN值,在本例中为出现在df1中的'd',但不出现在df2中。

Now add the the two sets of values_counts(). The fill_value argument will handle any NaN values that would arise, in this example, the 'd' that appears in df1, but not df2.

a = df1.User_id.value_counts()
b = df2.User_id.value_counts()
a.add(b,fill_value=0)

这篇关于合并并总结 pandas 中的几个价值计数系列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆