HDFS 文件观察器 [英] HDFS file watcher

查看:29
本文介绍了HDFS 文件观察器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以在 HDFS 上设置 file watcher 吗?

Can I have file watcher on HDFS?

场景:文件不断登陆HDFS.一旦文件数量达到阈值(可以是文件数量或文件大小),我想启动Spark Job.

Scenario: The files are landing on HDFS continuously.I want to start a Spark Job once the number of files reached a threshold(it can be number of files or size of the files).

是否可以在 HDFS 上实现文件观察器来实现这一点.如果是,那么任何人都可以建议这样做的方法吗?有哪些不同的选择?Zookeeper 或 Oozie 可以吗?

Is it possible to implement file watcher on HDFS to achieve this . If yes, then can anyone suggest the way to do it?What are the different options available? Can the Zookeeper or the Oozie do it?

任何帮助将不胜感激.谢谢.

Any help will be appreciated.Thanks.

推荐答案

Hadoop 2.6 引入了 DFSInotifyEventInputStream,您可以使用它.您可以从 HdfsAdmin 获取它的实例,然后只需调用 .take().poll() 即可获取所有事件.事件类型包括删除、追加和创建,它们应该涵盖您要查找的内容.

Hadoop 2.6 introduced DFSInotifyEventInputStream that you can use for this. You can get an instance of it from HdfsAdmin and then just call .take() or .poll() to get all the events. Event types include delete, append and create which should cover what you're looking for.

这是一个基本示例.确保以 hdfs 用户身份运行它,因为管理界面需要 HDFS 根.

Here's a basic example. Make sure you run it as the hdfs user as the admin interface requires HDFS root.

public static void main( String[] args ) throws IOException, InterruptedException, MissingEventsException
{
    HdfsAdmin admin = new HdfsAdmin( URI.create( args[0] ), new Configuration() );
    DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
    while( true ) {
        EventBatch events = eventStream.take();
        for( Event event : events.getEvents() ) {
            System.out.println( "event type = " + event.getEventType() );
            switch( event.getEventType() ) {
                case CREATE:
                    CreateEvent createEvent = (CreateEvent) event;
                    System.out.println( "  path = " + createEvent.getPath() );
                    break;
                default:
                    break;
            }
        }
    }
}

这是一篇详细介绍它的博文:

Here's a blog post that covers it in more detail:

http://johnjianfang.blogspot.com/2015/03/hdfs-6634-inotify-in-hdfs.html?m=1

这篇关于HDFS 文件观察器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆