来自 HBase 的 Hadoop mapreduce 流式传输 [英] Hadoop mapreduce streaming from HBase
问题描述
我正在构建一个使用 HBase (0.20.1) 作为数据源和数据接收器的 Hadoop (0.20.1) mapreduce 作业.我想用 Python 编写这份工作,这需要我使用 hadoop-0.20.1-streaming.jar 将数据流进出我的 Python 脚本.如果数据源/接收器是 HDFS 文件,这可以正常工作.
I'm building a Hadoop (0.20.1) mapreduce job that uses HBase (0.20.1) as both the data source and data sink. I would like to write the job in Python which has required me to use hadoop-0.20.1-streaming.jar to stream data to and from my Python scripts. This works fine if the data source/sink are HDFS files.
Hadoop 是否支持从/到 HBase 的流式传输以用于 mapreduce?
Does Hadoop support streaming from/to HBase for mapreduce?
推荐答案
这似乎是我想要的,但它不是 Hadoop 发行版的一部分.任何其他建议或意见仍然欢迎.
This seems to do what I want but it's not part of the Hadoop distribution. Any other suggestions or comments still welcome.
http://github.com/wanpark/hadoop-hbase-streaming
这篇关于来自 HBase 的 Hadoop mapreduce 流式传输的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!