HDFS文件观察器 [英] HDFS file watcher

查看:159
本文介绍了HDFS文件观察器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在 HDFS 上拥有文件监视器吗?



场景: b $ b文件连续登陆HDFS。一旦文件数达到阈值(可以是文件数量或文件大小),我想启动一个Spark作业。



是否可以在HDFS上实现文件观察器来实现这一点。如果是的话,那么任何人都可以提出实现的方法吗?有哪些不同的选择? Zookeeper或Oozie能做到吗?

任何帮助将不胜感激。谢谢。

解决方案

Hadoop 2.6引入了 DFSInotifyEventInputStream ,您可以使用它。您可以从 HdfsAdmin 获取它的一个实例,然后调用 .take() .poll()来获取所有事件。事件类型包括delete,append和create,它们应该涵盖你正在寻找的东西。



这里是一个基本的例子。请确保您将它作为 hdfs 用户运行,因为管理界面需要HDFS根目录。


$ b

 {
HdfsAdmin admin = new HdfsAdmin(URI.create(args [0] ),新的配置());
DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
while(true){
EventBatch events = eventStream.take(); (事件事件:events.getEvents()){
System.out.println(event type =+ event.getEventType());

switch(event.getEventType()){
case CREATE:
CreateEvent createEvent =(CreateEvent)event;
System.out.println(path =+ createEvent.getPath());
休息;
默认值:
break;






这里是一篇更详细的博客文章:

http://johnjianfang.blogspot.com/2015/03/hdfs-6634-inotify-in-hdfs.html?m=1


Can I have file watcher on HDFS?

Scenario: The files are landing on HDFS continuously.I want to start a Spark Job once the number of files reached a threshold(it can be number of files or size of the files).

Is it possible to implement file watcher on HDFS to achieve this . If yes, then can anyone suggest the way to do it?What are the different options available? Can the Zookeeper or the Oozie do it?

Any help will be appreciated.Thanks.

解决方案

Hadoop 2.6 introduced DFSInotifyEventInputStream that you can use for this. You can get an instance of it from HdfsAdmin and then just call .take() or .poll() to get all the events. Event types include delete, append and create which should cover what you're looking for.

Here's a basic example. Make sure you run it as the hdfs user as the admin interface requires HDFS root.

public static void main( String[] args ) throws IOException, InterruptedException, MissingEventsException
{
    HdfsAdmin admin = new HdfsAdmin( URI.create( args[0] ), new Configuration() );
    DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
    while( true ) {
        EventBatch events = eventStream.take();
        for( Event event : events.getEvents() ) {
            System.out.println( "event type = " + event.getEventType() );
            switch( event.getEventType() ) {
                case CREATE:
                    CreateEvent createEvent = (CreateEvent) event;
                    System.out.println( "  path = " + createEvent.getPath() );
                    break;
                default:
                    break;
            }
        }
    }
}

Here's a blog post that covers it in more detail:

http://johnjianfang.blogspot.com/2015/03/hdfs-6634-inotify-in-hdfs.html?m=1

这篇关于HDFS文件观察器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆