从单列Pandas数据框生成词云 [英] Generate word cloud from single-column Pandas dataframe

查看:823
本文介绍了从单列Pandas数据框生成词云的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有一栏的Pandas数据框:犯罪类型.该列包含16种不同的犯罪类别",我想将其形象化为一个词云,并根据数据帧内词频的大小来确定词的大小.

我尝试使用以下代码执行此操作:

将数据导入:

fields = ['Crime type']

text2 = pd.read_csv('allCrime.csv', usecols=fields)

要生成词云:

wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()

但是,出现此错误:

TypeError: expected string or bytes-like object

我能够使用以下代码从完整的数据集中创建更早的词云,但我希望词云仅从特定的犯罪类别"crime type"("allCrime.csv"包含约13列):

text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

我是Python和Pandas的新手(并且通常是编码!),所以将不胜感激地收到所有帮助.

解决方案

问题是,您使用的WordCloud.generate方法需要一个字符串,该字符串将在该字符串上计算单词实例,但您提供了pd.Series. /p>

取决于您希望词云在其上生成的内容,您可以执行以下操作:

  1. wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type'])),它将连接数据框列中的所有单词,然后计算所有实例.

  2. 使用WordCloud.generate_from_frequencies手动传递计算出的单词频率.

I have a Pandas dataframe with one column: Crime type. The column contains 16 different "categories" of crime, which I would like to visualise as a word cloud, with words sized based on their frequency within the dataframe.

I have attempted to do this with the following code:

To bring the data in:

fields = ['Crime type']

text2 = pd.read_csv('allCrime.csv', usecols=fields)

To generate the word cloud:

wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()

However, I get this error:

TypeError: expected string or bytes-like object

I was able to create an earlier word cloud from the full dataset, using the following code, but I want the word cloud to only generate words from the specific column, 'crime type' ('allCrime.csv' contains approx. 13 columns):

text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

I'm new to Python and Pandas (and coding generally!) so all help is gratefully received.

解决方案

The problem is that the WordCloud.generate method that you are using expects a string on which it will count the word instances but your provide a pd.Series.

Depending on what you want the word cloud to generate on you can either do:

  1. wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type'])), which would concatenate all words in your dataframe column and then count all instances.

  2. Use WordCloud.generate_from_frequencies to manually pass the computed frequencies of words.

这篇关于从单列Pandas数据框生成词云的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆