将输出从REDUCER写入多个表 [英] Write output to multiple tables from REDUCER
问题描述
我可以从我的reducer向HBase中的多个表写入输出吗?我浏览了不同的博客文章,但是即使使用 MultiTableOutputFormat
,也无法找到方法.
我提到了这一点:写入HBASE中的多个表 >
但无法找出 context.write
调用的API签名.
减速器代码:
公共类MyReducer扩展了TableReducer< Text,Result,Put>{私有静态最终Logger logger = Logger.getLogger(MyReducer.class);@SuppressWarnings(弃用")@Override受保护的void reduce(文本键,Iterable< Result>数据,Context上下文)引发IOException,InterruptedException {logger.info(正在处理--->" + key.toString());对于(结果res:数据){放置权=新放置权(res.getRow());KeyValue [] raw = res.raw();对于(KeyValue kv:raw){put.add(kv);}context.write(obj,put);**//我不知道如何在此处提供表格名称.**}}}
要标识表名,您应该将表名作为键传递给 context.write(key,put)
方法:
ImmutableBytesWritable键=新的ImmutableBytesWritable(Bytes.toBytes("tableName"));context.write(key,put);
但是,如果您想一次通过MapReduce作业加载大量数据,那么使用 MultiTableHFileOutputFormat
可能会很有趣.此输出格式为您需要的每个HBase表创建HFile,然后您可以使用 LoadIncrementalHFiles
工具轻松加载这些文件:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles/tmp/multiTableJobResult hbaseTable
您可以在文章中阅读有关 MultiTableHFileOutputFormat
的更多信息:Write to multiple tables in HBASE
But not able to figure out the API signature for context.write
call.
Reducer code:
public class MyReducer extends TableReducer<Text, Result, Put> {
private static final Logger logger = Logger.getLogger( MyReducer.class );
@SuppressWarnings( "deprecation" )
@Override
protected void reduce( Text key, Iterable<Result> data, Context context ) throws IOException, InterruptedException {
logger.info( "Working on ---> " + key.toString() );
for ( Result res : data ) {
Put put = new Put( res.getRow() );
KeyValue[] raw = res.raw();
for ( KeyValue kv : raw ) {
put.add( kv );
}
context.write( obj, put );
**// I dont know how to give table name here.**
}
}
}
To identify the table names you should pass the table name as the key to context.write(key, put)
method:
ImmutableBytesWritable key = new ImmutableBytesWritable(Bytes.toBytes("tableName"));
context.write(key, put);
But if you want to load a huge amount of data via MapReduce job at once then it might be interesting for you to use MultiTableHFileOutputFormat
. This output format creates HFiles for every HBase table you need and then you can easily load these files with LoadIncrementalHFiles
tool:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/multiTableJobResult hbaseTable
You can read more about MultiTableHFileOutputFormat
in the article: http://tech.adroll.com/blog/data/2014/07/15/multi-table-bulk-import.html
这篇关于将输出从REDUCER写入多个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!