在Hadoop中使用wget？ [英] Use wget with Hadoop?

查看：383 发布时间：2018/5/31 20:00:43 java hadoop mapreduce wget

本文介绍了在Hadoop中使用wget？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据集（〜31GB，带有扩展名为.gz的压缩文件），它存在于Web位置上，我想在其上运行我的Hadoop程序。该程序是对Hadoop附带的原始WordCount示例的轻微修改。就我而言，Hadoop安装在远程机器上（我通过ssh连接到该机器，然后运行我的作业）。问题是我无法将这个大数据集传输到远程计算机上的主目录（由于磁盘使用配额）。因此，我试着搜索是否有一种方法可以使用wget获取数据集并直接将其传递到HDFS上（而不需要保存在远程计算机上的本地帐户上），但没有运气。
这种方式是否存在？任何其他建议，让这个工作？

I have a dataset (~31GB, zipped file with extension .gz) which is present on a web location, and I want to run my Hadoop program on it. The program is a slight modification from the original WordCount example that comes shipped with Hadoop. In my case, Hadoop is installed on a remote machine (to which I connect via ssh and then run my jobs). The problem is that I can't transfer this large dataset to my home directory on the remote machine (due to disk usage quota). So, I tried searching for if there's a way to use wget to get the dataset and directly pass it onto the HDFS (without saving on my local acccount on the remote machine), but no luck. Does such a way even exist? Any other suggestions to get this working?

我已经尝试使用雅虎！虚拟机，它预先配置了Hadoop，但速度太慢，加上内存耗尽，因为数据集很大。

I've already tried using Yahoo! VM which comes pre-configured with Hadoop, but it's too slow and plus runs out of memory since the dataset is large.

推荐答案

看看这里的答案：把一个远程文件转化为hadoop而不复制到本地磁盘

您可以将数据从wget传输到hdfs。

You can pipe the data from wget to hdfs.

但是，您将遇到问题 - gz不可拆分，因此您无法在其上运行分布式映射/缩减。

However, you will have a problem - gz is not splittable so you won't be able to run a distributed map/reduce on it.

我建议你在本地下载文件，将其解压缩，然后将其放入或分割成多个文件并加载到hdfs中。

I suggest you download the file locally, unzip it and then either pipe it in or split it into multiple files and load them into hdfs.

这篇关于在Hadoop中使用wget？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Hadoop中使用wget？ [英] Use wget with Hadoop?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在Hadoop中使用wget？ [英] Use wget with Hadoop?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭