槽的Dir可以在远程机器上吗? [英] Can Spool Dir of flume be in remote machine?

查看:87
本文介绍了槽的Dir可以在远程机器上吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每当新文件到达特定文件夹时,我都试图从远程计算机将文件提取到我的hdfs。我遇到了水槽中的阀芯目录的概念,如果阀芯目录位于水槽代理运行的同一台机器上,它工作正常。

是否有任何方法在远程机器上配置spool dir?请帮助。

解决方案

您可能知道flume可以产生多个实例,也就是说您可以安装几个传递数据的flume实例它们之间。



因此,要回答您的问题:不,flume无法访问远程假脱机目录。但是你可以安装两个代理,一个在spool目录的机器上,另一个在hadoop节点上。



第一个将从spool读取并通过avro rpc传递它到第二个代理,它会将数据刷新到HDFS。



这是一个简单的设置,只需要几行配置。 b

I was trying to fetch files from a remote machine to my hdfs whenever a new file has arrived into a particular folder. I came across the concept of spool dir in flume, and it was working fine if the spool dir is in the same machine where the flume agent is running.

Is there any method to configure a spool dir in a remote machine ?? Please help.

解决方案

You might be aware that flume can spawn multiple instances, i.e. you can install several flume instances which pass the data between them.

So to answer your question: no, flume cannot access a remote spool directory. But you can install two agents, one on the machine with the spool directory and one on the hadoop node.

The first will read from spool and pass it on via avro rpc to the second agent which will flush the data to HDFS.

it's a simple set up, which requires just a couple of lines of configuration.

这篇关于槽的Dir可以在远程机器上吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆