Python,计算另一列中值的出现频率 [英] Python, count frequency of occurrence for value in another column

查看:65
本文介绍了Python,计算另一列中值的出现频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我一直在寻找类似问题的解决方案的 stackoverflow 并不断地撞墙.我是 Python 新手,并且使用 Pandas/python 进行 ETL,所以如果我没有充分描述我的情况,请原谅我.

So I've been scouring stackoverflow for solutions to similar problems and keep hitting walls. I am new to python and using pandas/python for ETL so forgive me if I am not describing my situation adequately.

我有两个数据帧 df1 看起来像:

I have two dataframes df1 looks like:

    Subscriber Key  OtherID  AnotherID
1     'abc'           '12'    '23'
2     'bcd'           '45'    '56'
3     'abc'           '12'    '23'
4     'abc'           '12'    '23'
5     'cde'           '78'    '90'
6     'bcd'           '45'    '56'

df2 看起来像:

    Subscriber Key  OtherID  AnotherID
1     'abc'           '12'    '23'
2     'bcd'           '45'    '56'
3     'cde'           '78'    '90'

我正在尝试返回 SubscriberKey: 'abc' 在数据帧中出现的次数.找到值后,我想将计数附加到另一个数据帧 (df2),这是我第一个重复数据删除的数据帧.

I am trying to return a count the number of times SubscriberKey: 'abc' occurs in the dataframe. After finding the values, I would like to append the count to another dataframe (df2) which is my first dataframe deduplicated.

它看起来像这样:

    Subscriber Key  OtherID  AnotherID Total Instances
1     'abc'           '12'    '23'           '3'
2     'bcd'           '45'    '56'           '1'
3     'cde'           '78'    '90'           '1'

所以我所做的是尝试使用这一行:

So what I did was try use this line:

    df1.groupby(['SubscriberKey']).size()

我只使用 'SubscriberKey' 的原因是因为有些行只在该列中填写了 'OtherID' 和 'AnotherID' 空白.

The reason I only used 'SubscriberKey' was because some rows only had that column filled out with 'OtherID' and 'AnotherID' blank.

我也试过 Series.value_count().当我尝试使用 groupby 和 size() 并将 df2['Total Instances'] 的值设置为出现次数时,这些值似乎没有正确对齐.

I have also tried Series.value_count(). When I try using groupby and size() and set the value of df2['Total Instances'] to the count of occurrences, it appears that the values do not line up correctly.

例如新表如下所示:

    Subscriber Key  OtherID  AnotherID Total Instances
1     'abc'           '12'    '23'           '1'
2     'bcd'           '45'    '56'           '3'
3     'cde'           '78'    '90'           '2'

所以我最初的想法可能是在进行 groupby 时,该函数自动对我的输出进行排序.我试图通过将 groupby 的表保存为 csv 来进行检查,并意识到它只打印出计数列,而不是与其关联的订阅者密钥列.

So my original thought was maybe when doing groupby, the function sorts my output automatically. I tried to check by saving the groupby'd table as a csv and realized it only prints out the count column and not the associated subscriberkey column with it.

无论如何,有人对我如何实现这一目标有任何意见吗?重申一下,我实际上只是想向 df2 添加一列,该列返回 df1 中出现或实例的总数.

Anyhow, does anybody have any input as to how I can achieve this? To reiterate, I wanted to essentially just add a column to df2 that returns total # of occurrences or instances within df1.

谢谢!

推荐答案

你可以试试:

df2['Total Instances'] = df2['Subscriber Key'].map(df1['Subscriber Key'].value_counts())

这篇关于Python,计算另一列中值的出现频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆