Twitter(社交网络)数据集 [英] Twitter (Social networking) Dataset
问题描述
它不一定是Twitter,但我更喜欢twitter或Facebook。我已经尝试过infochimps,但显然这个文件是不可下载的twitter。
有人可以给我好的网站找到这种数据集。我要将数据集提供给hadoop。
尝试以下三个数据集:
包含大约97毫升的推文:
ed note :以前链接的数据集不再可用,因为Twitter要求删除它。
包含4700万用户的用户图:
http://an.kaist.ac.kr/traces/WWW2010.html
以下数据集包含网络和推文,但数据收集雪球抽样或者因此,朋友网络不统一。它有大约1000万条推文,您可以向研究人员发送更多数据。
http://www.public.asu.edu/~mdechoud/datasets.html
虽然看看数据分发的许可证。
希望这有帮助,
还可以告诉我这个数据集有什么样的计划?
我有几个hadoop / pig脚本用于数据集
I am looking for twitter or other social networking sites dataset for my project. I currently have the CAW 2.0 twitter dataset but it only contains tweets of users. I want a data that shows the number of friends, follower and such.
It does not have to be twitter but I would prefer twitter or facebook. I already tried infochimps but apparently the file is not downloadable anymore for twitter.
Can someone give me good websites for finding this kind of dataset. I am going to feed the dataset to hadoop.
Try the following three datasets:
Contains around 97 milllion tweets:
ed note: the dataset previously linked above is no longer available because of a request from Twitter to remove it.
Contains user graph of 47 million users:
http://an.kaist.ac.kr/traces/WWW2010.html
Following dataset contains network as well as tweets, however the data was collected by snowball sampling or something hence the friends network is not uniform. It has around 10 million tweets you can mail the researcher for even more data.
http://www.public.asu.edu/~mdechoud/datasets.html
Though have a look at the license the data is distributed under.
Hope this helps, Also can you tell me what kind of work are planning with this dataset? I have few hadoop / pig scripts to use with dataset
这篇关于Twitter(社交网络)数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!