如何获得数据框中所有唯一的单词? [英] How to get all the unique words in the data frame?

查看:47
本文介绍了如何获得数据框中所有唯一的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有产品列表及其相应评论的数据框

I have a dataframe with a list of products and its respective review

+ --------- + -------- ---------------------------------------- +

|产品|评论|

+ --------- + ------------------------------ ------------------ +

| product_a |休闲午餐很好|

+ --------- + ------------------------- ----------------------- +

| product_b |艾利(Avery)是最知名的咖啡师之一|

+ --------- + ----------------------- ------------------------- +

| product_c |导游告诉我们的秘密|

+ --------- + ------------------------ ------------------------ +

+---------+------------------------------------------------+
| product | review |
+---------+------------------------------------------------+
| product_a | It's good for a casual lunch |
+---------+------------------------------------------------+
| product_b | Avery is one of the most knowledgable baristas |
+---------+------------------------------------------------+
| product_c | The tour guide told us the secrets |
+---------+------------------------------------------------+

如何获取数据框?

我做了一个函数:

def count_words(text):
    try:
        text = text.lower()
        words = text.split()
        count_words = Counter(words)
    except Exception, AttributeError:
        count_words = {'':0}
    return count_words

并应用

And applied the function to the DataFrame, but that only gives me the words count for each row.

reviews['words_count'] = reviews['review'].apply(count_words)


推荐答案

dfx
               review
0      United Kingdom
1  The United Kingdom
2     Dublin, Ireland
3    Mardan, Pakistan

要获取评论列中的所有单词:

To get all words in the "review" column:

 list(dfx['review'].str.split(' ', expand=True).stack().unique())

   ['United', 'Kingdom', 'The', 'Dublin,', 'Ireland', 'Mardan,', 'Pakistan']

要获取评论列的计数:

dfx['review'].str.split(' ', expand=True).stack().value_counts()


United      2
Kingdom     2
Mardan,     1
The         1
Ireland     1
Dublin,     1
Pakistan    1
dtype: int64    ​

这篇关于如何获得数据框中所有唯一的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆