从远程HDFS检索文件 [英] Retrieve files from remote HDFS

查看:147
本文介绍了从远程HDFS检索文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的本​​地计算机没有安装hdfs。我想从远程hdfs集群中检索文件。达到此目的的最佳方式是什么?我是否需要将获取文件从hdfs到其中一台集群机器fs,然后使用ssh检索它们?我希望能够以编程方式通过说一个bash脚本来完成这项工作。

解决方案

以下是步骤:




  • 确保您的主机和目标群集之间存在连接
  • 将主机配置为客户机,需要安装兼容的hadoop二进制文件。您的主机还需要使用相同的操作系统来运行。

  • 确保您具有相同的配置文件(core-site.xml,hdfs-site.xml)

  • 您可以运行 hadoop fs -get 命令直接获取文件


还有其他选择




  • 如果配置了Webhdfs / httpFS,则实际上可以使用curl或甚至浏览器下载文件。如果您的主机不能安装Hadoop二进制文件作为客户端,那么您可以使用以下命令: 指令。


    • 使您的主机无需登录密码即可登录集群上的其中一个节点

    • 运行命令 ssh< user> @< host> hadoop fs -get< hdfs_path>< os_path>

    • 然后scp命令复制文件可以在一个脚本中包含上述2个命令


    My local machine does not have an hdfs installation. I want to retrieve files from a remote hdfs cluster. What's the best way to achieve this? Do I need to get the files from hdfs to one of the cluster machines fs and then use ssh to retrieve them? I want to be able to do this programmatically through say a bash script.

    解决方案

    Here are the steps:

    • Make sure there is connectivity between your host and the target cluster
    • Configure your host as client, you need to install compatible hadoop binaries. Also your host needs to be running using same operating system.
    • Make sure you have the same configuration files (core-site.xml, hdfs-site.xml)
    • You can run hadoop fs -get command to get the files directly

    Also there are alternatives

    • If Webhdfs/httpFS is configured, you can actually download files using curl or even your browser. You can write bash scritps if Webhdfs is configured.

    If your host cannot have Hadoop binaries installed to be client, then you can use following instructions.

    • enable password less login from your host to the one of the node on the cluster
    • run command ssh <user>@<host> "hadoop fs -get <hdfs_path> <os_path>"
    • then scp command to copy files
    • You can have the above 2 commands in one script

    这篇关于从远程HDFS检索文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆