每周从FTP下载文件到HDFS [英] Download file weekly from FTP to HDFS

查看:354
本文介绍了每周从FTP下载文件到HDFS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将每周从ftp服务器上下载的文件自动化到CDH5 hadoop群集中。最好的办法是什么?



我正在考虑一个Oozie协调员的工作,但我想不出一个好的方法来下载文件。

解决方案

由于您使用CDH5,值得注意的是 NFSv3 与HDFS的接口包含在该Hadoop发行版中。您应该检查 CDH5安装指南文档中的配置NFSv3网关。

一旦完成,您可以使用wget,curl,python等将文件放到NFS挂载中。您可能希望通过Oozie完成此操作...进入作业设计器并创建Shell命令的副本。输入您选择进行数据传输的命令(python脚本,curl,ftp等),然后使用$ {myVar}对作业进行参数化。



这并不完美,但我认为它相当优雅。


I want to automate the weekly download of a file from an ftp server into a CDH5 hadoop cluster. What would be the best way to do this?

I was thinking about an Oozie coordinator job but I can't think of a good method to download the file.

解决方案

Since you're using CDH5, it's worth noting that the NFSv3 interface to HDFS is included in that Hadoop distribution. You should check for "Configuring an NFSv3 Gateway" in the CDH5 Installation Guide documentation.

Once that's done, you could use wget, curl, python, etc. to put the file onto the NFS mount. You probably want to do this through Oozie ... go into the job Designer and create a copy of the "Shell" command. Put in the command that you've selected to do the data transfer (python script, curl, ftp, etc), and parameterize the job using ${myVar}.

It's not perfect, but I think it's fairly elegant.

这篇关于每周从FTP下载文件到HDFS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆