hadoop getmerge到另一台机器 [英] hadoop getmerge to another machine

查看:467
本文介绍了hadoop getmerge到另一台机器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以将 hadoop dfs -getmerge 命令的输出存储到另一台计算机?

原因是我的本地机器没有足够的空间。作业输出为100GB,本地存储容量为60GB。

另一个可能的原因可能是我想在另一个程序的本地处理另一个程序中的输出,并且我不想将它传送两次(HDFS-> local FS - >远程机器)。我只想(HDFS - >远程机器)。



我正在寻找类似于 scp 的工作方式,例如:



hadoop dfs -getmerge / user / hduser /作业输出用户@ someIP:/ home / user /



另外,我还想从远程主机获取HDFS数据到本地机器。



在这种情况下可以使用unix管道吗? / p>

对于那些不熟悉hadoop的人,我只是想找一种方法来替换本地的dir参数( / user / hduser / Job-

解决方案

这将会执行正确的操作你需要:

  hadoop fs -cat / user / hduser / Job-output / * | ssh user@remotehost.comcat> mergedOutput.txt

fs -cat将读取所有文件然后将它们输出到stdout。

将把它们传递给远程机器上的一个文件(注意scp不会接受stdin作为输入)

Is it possible to store the output of the hadoop dfs -getmerge command to another machine?

The reason is that there is no enough space in my local machine. The job output is 100GB and my local storage is 60GB.

Another possible reason could be that I want to process the output in another program locally, in another machine and I don't want to transfer it twice (HDFS-> local FS -> remote machine). I just want (HDFS -> remote machine).

I am looking for something similar to how scp works, like:

hadoop dfs -getmerge /user/hduser/Job-output user@someIP:/home/user/

Alternatively, I would also like to get the HDFS data from a remote host to my local machine.

Could unix pipelines be used in this occasion?

For those who are not familiar with hadoop, I am just looking for a way to replace a local dir parameter (/user/hduser/Job-output) in this command with a directory on a remote machine.

解决方案

This will do exactly what you need:

hadoop fs -cat /user/hduser/Job-output/* | ssh user@remotehost.com "cat >mergedOutput.txt"

fs -cat will read all files in sequence and output them to stdout.

ssh will pass them to a file on remote machine (note that scp will not accept stdin as input)

这篇关于hadoop getmerge到另一台机器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆