如何知道 HDFS 中添加了新数据? [英] How to know that a new data is been added to HDFS?

查看:42
本文介绍了如何知道 HDFS 中添加了新数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实现一个基于发布订阅模型的通知系统,以在数据到达/加载到 HDFS 时通知数据的可用性.我没有找到在哪里寻找这个的方法.是否有任何 HDFS API 可用于执行此操作,或者我应该使用什么方法来获取写入 HDFS 的新数据的信息?我正在使用 Hadoop v2.0.2,我不想使用 HCatalog,我想实现自己的工具来执行此操作.

I am implementing a Notification system based on publish subscribe model to notify about the availability of data as it arrives/loaded to HDFS. I did n't find a ways where to look for this. Is there any HDFS API which can be used to do this or what method should I use to get information of new data written to HDFS? I am using Hadoop v2.0.2 and I don't want to use HCatalog, I want to implement my own tool to do this.

推荐答案

你要找的是Oozie Coordinator.

HDFS 是一个文件系统,因此必须在 HDFS 之上构建一些东西来检查文件的可用性.HBase 有协处理器,它们是触发过程.但它仅适用于 HBase 表.因此它不能用于检测 HDFS 中的数据可用性.

HDFS is a file system, so something must be built on top of HDFS to check for file availability. HBase has coprocessor which are triggered procedures . But it is only available for HBase tables. So it cannot be used for detecting data availabilty in HDFS.

Oozie 是一个用于管理 Hadoop 作业的工作流调度系统.Oozie Coordinator 作业是由时间触发的周期性 Oozie Workflow 作业(频率)和数据可用性.您也可以从中执行其他程序:

Oozie is a workflow scheduler system to manage Hadoop jobs. Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty. Also you can execute other programs from it :

Oozie 与支持的 Hadoop 堆栈的其余部分集成在一起开箱即用的几种类型的 Hadoop 作业(例如 Java map-reduce,流式 map-reduce、Pig、Hive、Sqoop 和 Distcp)以及系统特定的作业(例如 Java 程序和 shell 脚本).

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

因此,您也可以将文件可用性触发器用于通知系统.

So you can use the file availability trigger for your notification system too.

这篇关于如何知道 HDFS 中添加了新数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆