按日期绘制单词云以获取Twitter搜索结果? (使用R) [英] Plotting a word-cloud by date for a twitter search result? (using R)

查看:71
本文介绍了按日期绘制单词云以获取Twitter搜索结果? (使用R)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在Twitter上搜索一个单词(例如#google),然后能够生成twitt中使用的单词的标签云,但是要根据日期(例如,一个小时的移动窗口,每次移动10分钟,并向我展示如何在一天中更频繁地使用不同的单词。)

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).

我非常感谢您提供帮助,关于:信息资源,编程代码(R是我最喜欢使用的唯一语言)和可视化想法。问题:

I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:


  1. 如何获取信息?

  1. How do I get the information?

在R中,我发现twitteR软件包具有searchTwitter命令。但是我不知道我能从中得到多大的 n。另外,它不会返回twitt的起始日期。

In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.

我看到了在twitter上计算特定字的结果数,直到1500 twitt为止,但这需要我手动进行解析(这导致我进入第2步)。另外,出于我的目的,我需要成千上万的twitt。是否有可能让他们回想起来? (例如,每次通过API URL询问较旧的帖子吗?)如果不是,则存在一个更普遍的问题,即如何在家用计算机上创建twitt的个人存储? (这个问题可能最好留给另一个SO线程-尽管这里的人的任何见解对我来说都是非常有趣的)

I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)

如何解析信息(在R中)?我知道R具有可以从rcurl和twitteR包中获得帮助的功能。但我不知道使用哪种或如何使用它们。任何建议都会有所帮助。

How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.

如何分析?如何删除所有不有趣的单词?我发现R中的 tm包具有此示例

How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:

路透社<-tm_map(路透社,removeWords,停用词(英语))

reuters <- tm_map(reuters, removeWords, stopwords("english"))

这会成功吗?我应该做其他事情/更多吗?

Would this do the trick? I should I do something else/more ?

此外,我想我想在根据时间剪切数据集后这样做(这将需要一些类似于posix的函数) (我不确定在这里需要什么,或者如何使用它。)

Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).

最后,还有一个可视化问题。我创建了单词的标签云吗?我发现这里的解决方案,还有其他建议/建议吗?

And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?

我相信我我在这里问了一个大问题,但我尝试将其分解为尽可能多的简单问题。欢迎您提供任何帮助!

I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!

最好,

Tal

推荐答案


  • R中的单词/标签云使用代码段 package

  • www.wordle.net

    • Word/Tag cloud in R using "snippets" package
    • www.wordle.net

      使用openNLP软件包,您可以在推文中添加标签s(pos =词性),然后仅提取名词,动词或形容词以在wordcloud中进行可视化。

      Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.

      这篇关于按日期绘制单词云以获取Twitter搜索结果? (使用R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆