从单列Pandas数据框生成词云 [英] Generate word cloud from single-column Pandas dataframe
问题描述
我有一个带有一栏的Pandas数据框:犯罪类型.该列包含16种不同的犯罪类别",我想将其形象化为一个词云,并根据数据帧内词频的大小来确定词的大小.
我尝试使用以下代码执行此操作:
将数据导入:
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
要生成词云:
wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
但是,出现此错误:
TypeError: expected string or bytes-like object
我能够使用以下代码从完整的数据集中创建更早的词云,但我希望词云仅从特定的犯罪类别"crime type"("allCrime.csv"包含约13列):
text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
我是Python和Pandas的新手(并且通常是编码!),所以将不胜感激地收到所有帮助.
问题是,您使用的WordCloud.generate
方法需要一个字符串,该字符串将在该字符串上计算单词实例,但您提供了pd.Series
. /p>
取决于您希望词云在其上生成的内容,您可以执行以下操作:
-
wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))
,它将连接数据框列中的所有单词,然后计算所有实例. -
使用
WordCloud.generate_from_frequencies
手动传递计算出的单词频率.
I have a Pandas dataframe with one column: Crime type. The column contains 16 different "categories" of crime, which I would like to visualise as a word cloud, with words sized based on their frequency within the dataframe.
I have attempted to do this with the following code:
To bring the data in:
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
To generate the word cloud:
wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
However, I get this error:
TypeError: expected string or bytes-like object
I was able to create an earlier word cloud from the full dataset, using the following code, but I want the word cloud to only generate words from the specific column, 'crime type' ('allCrime.csv' contains approx. 13 columns):
text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
I'm new to Python and Pandas (and coding generally!) so all help is gratefully received.
The problem is that the WordCloud.generate
method that you are using expects a string on which it will count the word instances but your provide a pd.Series
.
Depending on what you want the word cloud to generate on you can either do:
wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))
, which would concatenate all words in your dataframe column and then count all instances.Use
WordCloud.generate_from_frequencies
to manually pass the computed frequencies of words.
这篇关于从单列Pandas数据框生成词云的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!