为 Hadoop 下载大数据 [英] Download large data for Hadoop

查看:23
本文介绍了为 Hadoop 下载大数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个大数据(超过 10GB)来运行 Hadoop 演示.谁知道哪里可以下载.请告诉我.

I need a large data (more than 10GB) to run Hadoop demo. Anybody known where I can download it. Please let me know.

推荐答案

我建议你从以下网站下载百万首歌曲数据集:

I would suggest you downloading million songs Dataset from the following website:

http://labrosa.ee.columbia.edu/millionsong/

Millions Songs Dataset 的最大优点是您可以将 1GB(约 10000 首歌曲)、10GB、50GB 或约 300GB 的数据集下载到您的 Hadoop 集群,并进行您想要的任何测试.我喜欢使用它并使用这个数据集学到很多东西.

The best thing with Millions Songs Dataset is that you can download 1GB (about 10000 songs), 10GB, 50GB or about 300GB dataset to your Hadoop cluster and do whatever test you would want. I love using it and learn a lot using this data set.

首先,您可以下载以 A-Z 中任何一个字母开头的数据集,范围从 1GB 到 20GB.您也可以使用 Infochimp 站点:

To start with you can download dataset start with any one letter from A-Z, which will be range from 1GB to 20GB.. you can also use Infochimp site:

http://www.infochimps.com/collections/million-songs

在我的以下博客之一中,我展示了如何下载 1GB 数据集并运行 Pig 脚本:

In one of my following blog I showed how to download 1GB dataset and run Pig scripts:

http://blogs.msdn.com/b/avkashchauhan/archive/2012/04/12/processing-million-songs-dataset-with-pig-scripts-on-apache-hadoop-on-windows-azure.aspx

这篇关于为 Hadoop 下载大数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆