如何使用MapReduce查询HBase数据? [英] How to query HBase data using MapReduce?
问题描述
您好,我是MapReduce和HBase的新手。请指导。我正在使用MapReduce将表格数据移动到HBase。现在数据已经在HBase中达到了(所以在HDFS中)。我创建了mapreduce作业,它将从文件读取表格数据并使用HBase API将其放入Hbase。
现在我怀疑是否可以使用MapReduce查询HBase数据?我不想执行HBase命令来查询数据。是否有可能使用MapReduce的查询HBase的数据?
请帮助或建议。
当然可以,HBase的带有一个 TableMapReduceUtil
可帮助您配置MapReduce作业以扫描数据。它会自动为每个地区创建一个地图任务。
请检查此示例 从HBase书中提取 :
配置配置= HBaseConfiguration.create();
Job job = new Job(config,ExampleRead);
job.setJarByClass(MyReadJob.class); //包含映射器的类
Scan scan = new Scan();
scan.setCaching(500); // 1是Scan中的默认设置,这对于MapReduce作业会很糟糕
scan.setCacheBlocks(false); //不要设置为true MR工作
//设置其他扫描ATTRS
...
TableMapReduceUtil.initTableMapperJob(
表名,//输入HBase的表名
scan,//扫描实例以控制CF和属性选择
MyMapper.class,//映射器
null,//映射器输出键
null,//映射器输出价值
工作);
job.setOutputFormatClass(NullOutputFormat.class); //因为我们没有从映射器发射任何东西
布尔型b = job.waitForCompletion(true);
if(!b){
抛出新的IOException(有错误的工作!);
}
Hi I am new to MapReduce and HBase. Please guide. I am moving tabular data to HBase using MapReduce. Now data is reached in HBase (so in HDFS). I have created mapreduce job which will read tabular data from file and put it into Hbase using HBase APIs.
Now my doubt is can I query HBase data using MapReduce? I dont want to execute HBase commands to query data. Is is possible to query data of HBase using MapReduce?
Please help or advice.
Of course you can, HBase comes with a TableMapReduceUtil
to help you configuring MapReduce jobs for scanning data. It will automatically create a map task for each region.
Please check this example extracted from the HBase book:
Configuration config = HBaseConfiguration.create();
Job job = new Job(config, "ExampleRead");
job.setJarByClass(MyReadJob.class); // class that contains mapper
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
...
TableMapReduceUtil.initTableMapperJob(
tableName, // input HBase table name
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper
null, // mapper output key
null, // mapper output value
job);
job.setOutputFormatClass(NullOutputFormat.class); // because we aren't emitting anything from mapper
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
这篇关于如何使用MapReduce查询HBase数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!