如何根据 pandas 数据框中的频率创建wordcloud [英] How to create a wordcloud according to frequencies in a pandas dataframe

查看:214
本文介绍了如何根据 pandas 数据框中的频率创建wordcloud的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须画一个词云。 tweets.csv是熊猫数据框,其中有一列名为文本。所绘制的图表并非基于最常用的词语,艰难。

  text = df_final.text.values 
wordcloud = WordCloud(
#mask = logomask,
max_words = 1000,
width = 600,
height = 400,
#max_font_size = 1000,
#min_font_size = 100,
normalize_plurals = True,
#scale = 5,
#relative_scaling = 0,
background_color ='black',
停用词= STOPWORDS.union(停用词)
).generate(str(text))
图= plt.figure(
图大小=(50,40),
facecolor ='k',
edgecolor ='k ')
plt.imshow(wordcloud,插值='双线性')
plt.axis('off')
plt.tight_layout(pad = 0)
plt.show( )

我的数据框如下:

  0 RT @Pontifex_pt:Temos que descobrir as riquezezas ... 
1 RT @Pontifex_pt:Todos estamos em viagem rumo ...
2 RT @Pontifex_pt:Unamos asforças,em todos ...
3天@ G eneralMourao:#Segurançapública,preocupa ...
4 RT @ FIFAcom:Brasileirao U-17决赛提供了...

解决方案

设置示例数据框:



  • 另请参见


    转换 count 列到 dict



    • WordCloud()。generate_from_frequencies()需要 dict


      data = dict(zip(df ['word']。tolist(),df ['count']。tolist()))

    打印(数据)

    > {'how':7,'are':10,'you':4,'doing':1,'this':20,'afternoon':100}


    Wordcloud:



    • 使用 .generate_from_frequencies


    • 使用图像遮罩:


        twitter_mask = np.array(Image.open('twitter.png'))
      wc = WordCloud(background_color ='white',width = 800,height = 400,max_words = 200 ,mask = twitter_mask).generate_from_frequencies(data_nyt)

      plt.figure(figsize =(10,10))
      plt.imshow(wc,插值='双线性')
      plt.axis( off)
      plt.figure()
      plt.imshow(twitter_mask,cmap = plt.cm.gray,插值='双线性')
      plt.axis( " off")
      plt.show()


      I have to plot a wordcloud. 'tweets.csv' is a Pandas dataframe which has a column named 'text'. The plotted graph hasn't been based on the most common words, tough. How can the words sizes be linked to their frequencies in dataframe?

      text = df_final.text.values
      wordcloud = WordCloud(
          #mask = logomask,
          max_words = 1000,
          width = 600,
          height = 400,
          #max_font_size = 1000,
          #min_font_size = 100,
          normalize_plurals = True,
          #scale = 5,
          #relative_scaling = 0,
          background_color = 'black',
          stopwords = STOPWORDS.union(stopwords)
      ).generate(str(text))
      fig = plt.figure(
          figsize = (50,40),
          facecolor = 'k',
          edgecolor = 'k')
      plt.imshow(wordcloud, interpolation = 'bilinear')
      plt.axis('off')
      plt.tight_layout(pad=0)
      plt.show()
      

      My dataframe looks like this:

      0   RT @Pontifex_pt: Temos que descobrir as riquezezas ...
      1   RT @Pontifex_pt: Todos estamos em viagem rumo ...
      2   RT @Pontifex_pt: Unamos as forças, em todos ...
      3   RT @GeneralMourao: #Segurançapública, preocupa ...
      4   RT @FIFAcom: The Brasileirao U-17 final provided ...
      

      解决方案

      Setup a Sample DataFrame:

      import pandas as pd
      
      df = pd.DataFrame({'word': ['how', 'are', 'you', 'doing', 'this', 'afternoon'],
                         'count': [7, 10, 4, 1, 20, 100]}) 
      

      Convert the word & count columns to a dict

      • WordCloud().generate_from_frequencies() requires a dict

      data = dict(zip(df['word'].tolist(), df['count'].tolist()))
      
      print(data)
      
      >>> {'how': 7, 'are': 10, 'you': 4, 'doing': 1, 'this': 20, 'afternoon': 100}                                                                          
      

      Wordcloud:

      from wordcloud import WordCloud
      
      wc = WordCloud(width=800, height=400, max_words=200).generate_from_frequencies(data)
      

      Plot

      import matplotlib.pyplot as plt
      
      plt.figure(figsize=(10, 10))
      plt.imshow(wc, interpolation='bilinear')
      plt.axis('off')
      plt.show()
      

      Using an image mask:

      twitter_mask = np.array(Image.open('twitter.png'))
      wc = WordCloud(background_color='white', width=800, height=400, max_words=200, mask=twitter_mask).generate_from_frequencies(data_nyt)
      
      plt.figure(figsize=(10, 10))
      plt.imshow(wc, interpolation='bilinear')
      plt.axis("off")
      plt.figure()
      plt.imshow(twitter_mask, cmap=plt.cm.gray, interpolation='bilinear')
      plt.axis("off")
      plt.show()
      

      这篇关于如何根据 pandas 数据框中的频率创建wordcloud的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆