如何在 pandas 数据框中拆分文本数据并计算出现次数? [英] How to split text data and count number of occurrences in pandas dataframe?

查看：120 发布时间：2020/5/24 4:27:08 pandas dataframe split

本文介绍了如何在 pandas 数据框中拆分文本数据并计算出现次数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在数据框中有以下格式的数据:

I have data in dataframe in the following format:

df=pd.DataFrame([
    [42,{"tags":["illustration","logo","design","ui"]}],
    [81,{"tags":["typography","icon","vector","ux"]}],
    [98,{"tags":["branding","app"]}],
    [52,{"tags":["animation","web","flat"]}],
    [17,{"tags":["type","lettering"]}],
    [37,{"tags":["illustration","typography","branding","typography","branding"]}],
    [63,{"tags":["logo","icon","app","web","lettering"]}],
    [47,{"tags":["ui","ux"]}],
    [6,{"tags":["design","vector","icon","flat","lettering","branding","app"]}],
    [53,{"tags":["ui","ux","lettering","branding","app","animation","web","flat"]}],
    [64,{"tags":["branding","app","typography","branding"]}],
    [89,{"tags":["typography","branding","ux","lettering","branding"]}]
],columns=["_id","tags"])

我想用特定数量的标签来计算"id"的数量(此数量的分布)，因此对于上面的数据，它应该是:

I want to count the number of 'id' with specific number of tags (distribution of this number), so for the data above it would be:

Number of posts    Number of tags 
     3                 2
     1                 3
     3                 4 
     3                 5
     1                 7

对于该任务，我应该如何处理给定格式的文本标签?

How should I handle the text tags in the given format for this task?

谢谢

推荐答案

使用DataFrame构造函数+ Counter 具有list的理解，对于每个tags的计数长度为list s:

Use DataFrame constructor + Counter with list comprehension for count lengths of each tags as lists:

from collections import Counter

c = Counter([len(x['tags']) for x in df['tags']])

df = pd.DataFrame({'Number of posts':list(c.values()), ' Number of tags ': list(c.keys())})
print (df)
   Number of posts   Number of tags 
0                3                 4
1                3                 2
2                1                 3
3                3                 5
4                1                 7
5                1                 8

或将apply与 value_counts一起使用:

Or use apply with value_counts:

df = (df['tags'].apply(lambda x: len(x['tags']))
                .value_counts()
                .rename_axis('Number of tags')
                .reset_index(name='Number of posts')
                [['Number of posts','Number of tags']])
print (df)
   Number of posts  Number of tags
0                3               5
1                3               4
2                3               2
3                1               8
4                1               7
5                1               3

这篇关于如何在 pandas 数据框中拆分文本数据并计算出现次数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 pandas 数据框中拆分文本数据并计算出现次数? [英] How to split text data and count number of occurrences in pandas dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 pandas 数据框中拆分文本数据并计算出现次数? [英] How to split text data and count number of occurrences in pandas dataframe?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭