Twitter(社交网络)数据集 [英] Twitter (Social networking) Dataset

查看:4745
本文介绍了Twitter(社交网络)数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为我的项目寻找twitter或其他社交网站数据集。我目前有CAW 2.0 twitter数据集,但它只包含用户的推文。我想要一个显示朋友,追随者等的数据。



它不一定是Twitter,但我更喜欢twitter或Facebook。我已经尝试过infochimps,但显然这个文件是不可下载的twitter。



有人可以给我好的网站找到这种数据集。我要将数据集提供给hadoop。

解决方案

尝试以下三个数据集:



包含大约97毫升的推文:



http://demeter.inf.ed。 ac.uk/index.php?option=com_content&view=article&id=2:test-post-for-twitter&catid=1:twitter&Itemid=2



ed note :以前链接的数据集不再可用,因为Twitter要求删除它。

包含4700万用户的用户图:



http://an.kaist.ac.kr/traces/WWW2010.html



以下数据集包含网络和推文,但数据收集雪球抽样或者因此,朋友网络不统一。它有大约1000万条推文,您可以向研究人员发送更多数据。



http://www.public.asu.edu/~mdechoud/datasets.html



虽然看看数据分发的许可证。



希望这有帮助,
还可以告诉我这个数据集有什么样的计划?
我有几个hadoop / pig脚本用于数据集


I am looking for twitter or other social networking sites dataset for my project. I currently have the CAW 2.0 twitter dataset but it only contains tweets of users. I want a data that shows the number of friends, follower and such.

It does not have to be twitter but I would prefer twitter or facebook. I already tried infochimps but apparently the file is not downloadable anymore for twitter.

Can someone give me good websites for finding this kind of dataset. I am going to feed the dataset to hadoop.

解决方案

Try the following three datasets:

Contains around 97 milllion tweets:

http://demeter.inf.ed.ac.uk/index.php?option=com_content&view=article&id=2:test-post-for-twitter&catid=1:twitter&Itemid=2

ed note: the dataset previously linked above is no longer available because of a request from Twitter to remove it.

Contains user graph of 47 million users:

http://an.kaist.ac.kr/traces/WWW2010.html

Following dataset contains network as well as tweets, however the data was collected by snowball sampling or something hence the friends network is not uniform. It has around 10 million tweets you can mail the researcher for even more data.

http://www.public.asu.edu/~mdechoud/datasets.html

Though have a look at the license the data is distributed under.

Hope this helps, Also can you tell me what kind of work are planning with this dataset? I have few hadoop / pig scripts to use with dataset

这篇关于Twitter(社交网络)数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆