基于多个标准的多列 pandas 计数 [英] Multi-Column Pandas counting based on multiple criteria

查看:38
本文介绍了基于多个标准的多列 pandas 计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个我想在下面数的单词"列表

I have a list of 'words' I want to count below

word_list = ['one','two','three']

我在 Pandas 数据框中有一列,下面是文本.

And I have a column within pandas dataframe with text below.

TEXT                                       | USER    | ID
-------------------------------------------|---------|------
"Perhaps she'll be the one for me."        | User 1  | 100
"Is it two or one?"                        | User 1  | 100
"Mayhaps it be three afterall..."          | User 2  | 150
"Three times and it's a charm."            | User 2  | 150
"One fish, two fish, red fish, blue fish." | User 2  | 150
"There's only one cat in the hat."         | User 3  | 200
"One does not simply code into pandas."    | User 3  | 200
"Two nights later..."                      | User 1  | 100
"Quoth the Raven... nevermore."            | User 2  | 150

我想要的输出如下所示,我想使用TEXT"列中的数据计算拥有与 word_list 中任何单词相关的文本的唯一用户的数量.统计完唯一用户数后,我还想统计每条推文相关的关注者总数,与该词的唯一用户数相关联.

The desired output that I would like is the following below, where I want to count the number of unique users that has text related to any word in word_list, using the data found in the "TEXT" column. After counting the unique users, I also want to count the sum of the followers related to each tweet, associated with the unique user count of the word.

Word | Unique User Count | ID Sum
one  |      3            | 450
two  |      2            | 250
three|      1            | 150

有没有办法在 Python 2.7 中做到这一点?

Is there a way to do this in Python 2.7?

推荐答案

我分解步骤

df.columns=['TEXT','USER','ID']

df[word_list]=df.TEXT.str.lower().apply(lambda x : pd.Series([x.find(y) for y in word_list])).ne(-1)
df1=df[['USER','one','two','three','ID']].set_index(['USER','ID']).astype(int).replace({0:np.nan})
Target=df1.stack().reset_index().groupby('level_2').agg({'USER':lambda x : len(set(x)),'ID':lambda x : sum(set(x))})
Target=Target.reset_index()
Target.columns=['Word','Unique User Count','ID Sum']
Target
Out[97]: 
    Word  Unique User Count  ID Sum
0    one                  3     450
1  three                  1     150
2    two                  2     250

这篇关于基于多个标准的多列 pandas 计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆