Hadoop mapreduce从HBase流式传输 [英] Hadoop mapreduce streaming from HBase
问题描述
我正在构建一个使用HBase(0.20.1)作为数据源和数据宿的Hadoop(0.20.1)mapreduce作业。我想用Python编写这个工作,它需要我使用hadoop-0.20.1-streaming.jar来将数据传入和传出我的Python脚本。如果数据源/接收器是HDFS文件,这工作正常。
Hadoop是否支持从/到HBase的mapreduce流?
这似乎是我想要的,但它不是Hadoop发行版的一部分。任何其他建议或意见仍然欢迎。
http://github.com/wanpark/hadoop-hbase-streaming
I'm building a Hadoop (0.20.1) mapreduce job that uses HBase (0.20.1) as both the data source and data sink. I would like to write the job in Python which has required me to use hadoop-0.20.1-streaming.jar to stream data to and from my Python scripts. This works fine if the data source/sink are HDFS files.
Does Hadoop support streaming from/to HBase for mapreduce?
This seems to do what I want but it's not part of the Hadoop distribution. Any other suggestions or comments still welcome.
http://github.com/wanpark/hadoop-hbase-streaming
这篇关于Hadoop mapreduce从HBase流式传输的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!