如何知道新数据已被添加到HDFS? [英] How to know that a new data is been added to HDFS?

查看:115
本文介绍了如何知道新数据已被添加到HDFS?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实施一个基于发布订阅模型的通知系统,以便在数据到达/加载到HDFS时通知数据的可用性。我没有找到找到方法。是否有任何可用于执行此操作的HDFS API,或者我应该使用什么方法来获取写入HDFS的新数据信息?我使用的是Hadoop v2.0.2,我不想使用HCatalog,我想实现我自己的工具来执行此操作。 解决方案

您在寻找的是 Oozie协调员


$ b

HDFS 是一个文件系统,所以必须在HDFS之上构建一些文件来检查文件的可用性。 HBase 具有触发程序的协处理器。但它仅适用于HBase表。所以它不能用于检测HDFS中的数据可用性。



Oozie 是一个用于管理Hadoop作业的工作流调度程序系统。 Oozie协调员职位是经常性的Oozie工作流作业,由时间(频率)和数据可用性。您也可以执行其他程序:

lockquote
Oozie与Hadoop堆栈的其余部分集成在一起,支持
几种类型的Hadoop (例如Java map-reduce,
Streaming map-reduce,Pig,Hive,Sqoop和Distcp)以及系统
特定作业(如Java程序和shell脚本)。


因此,您也可以为通知系统使用文件可用性触发器。


I am implementing a Notification system based on publish subscribe model to notify about the availability of data as it arrives/loaded to HDFS. I did n't find a ways where to look for this. Is there any HDFS API which can be used to do this or what method should I use to get information of new data written to HDFS? I am using Hadoop v2.0.2 and I don't want to use HCatalog, I want to implement my own tool to do this.

解决方案

What you are looking for is Oozie Coordinator.

HDFS is a file system, so something must be built on top of HDFS to check for file availability. HBase has coprocessor which are triggered procedures . But it is only available for HBase tables. So it cannot be used for detecting data availabilty in HDFS.

Oozie is a workflow scheduler system to manage Hadoop jobs. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty. Also you can execute other programs from it :

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

So you can use the file availability trigger for your notification system too.

这篇关于如何知道新数据已被添加到HDFS?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆