我们可以从HBase表中获取所有列名吗? [英] Can we get all the column names from an HBase table?

查看:2765
本文介绍了我们可以从HBase表中获取所有列名吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设置:



我有一个HBase表,有100M +行和1百万列。每行只有2到5列的数据。问题:

我想找出所有不同的<$ <$>

在这个列族中的c $ c> qualifiers (列)。有没有一个快速的方法来做到这一点?



我可以考虑扫描整个表,然后获取 familyMap 对于每一行,得到 qualifier ,并将它添加到 Set<> 中。但那会非常缓慢,因为有100M +行。



我们可以做得更好吗?

解决方案

您可以为此使用mapreduce。在这种情况下,您不需要为协处理器安装hbase的自定义库。
下面是创建mapreduce任务的代码。

作业设置

 作业作业= Job.getInstance(config) ; 
job.setJobName(Distinct columns);

扫描扫描=新扫描();
scan.setBatch(500);
scan.addFamily(YOU_COLUMN_FAMILY_NAME);
scan.setFilter(new KeyOnlyFilter()); //只扫描KeyValue的关键部分(raw,列族,列)
scan.setCacheBlocks(false); //不要为MR作业设置为true


TableMapReduceUtil.initTableMapperJob(
YOU_TABLE_NAME,
scan,
OnlyColumnNameMapper.class,// mapper
Text.class,//映射器输出键
Text.class,//映射器输出值
job);

job.setNumReduceTasks(1);
job.setReducerClass(OnlyColumnNameReducer.class);
job.setReducerClass(OnlyColumnNameReducer.class);

Mapper

  public class OnlyColumnNameMapper扩展TableMapper< Text,Text> {
@Override
protected void map(ImmutableBytesWritable key,Result value,final Context context)throws IOException,InterruptedException {
CellScanner cellScanner = value.cellScanner();
while(cellScanner.advance()){

Cell cell = cellScanner.current();
byte [] q = Bytes.copy(cell.getQualifierArray(),
cell.getQualifierOffset(),
cell.getQualifierLength());

context.write(new Text(q),new Text());


$ b $ / code $ / pre>

}



Reducer

  public class OnlyColumnNameReducer extends Reducer< Text,Text,Text,Text> {

@Override
protected void reduce(Text key,Iterable< Text> values,Context context)throws IOException,InterruptedException {
context.write(new Text(key),新的文本());
}
}


Setup:

I have an HBase table, with 100M+ rows and 1 Million+ columns. Every row has data for only 2 to 5 columns. There is in just 1 Column Family.

Problem:

I want to find out all the distinct qualifiers (columns) in this column family. Is there a quick way to do that?

I can think of about scanning the whole table, then getting familyMap for each row, get qualifier and add it to a Set<>. But that would be awfully slow, as there are 100M+ rows.

Can we do any better?

解决方案

You can use a mapreduce for this. In this case you don't need to install a custom libs for hbase as in case for coprocessor. Below a code for creating a mapreduce task.

Job setup

    Job job = Job.getInstance(config);
    job.setJobName("Distinct columns");

    Scan scan = new Scan();
    scan.setBatch(500);
    scan.addFamily(YOU_COLUMN_FAMILY_NAME);
    scan.setFilter(new KeyOnlyFilter()); //scan only key part of KeyValue (raw, column family, column)
    scan.setCacheBlocks(false);  // don't set to true for MR jobs


    TableMapReduceUtil.initTableMapperJob(
            YOU_TABLE_NAME,
            scan,          
            OnlyColumnNameMapper.class,   // mapper
            Text.class,             // mapper output key
            Text.class,             // mapper output value
            job);

    job.setNumReduceTasks(1);
    job.setReducerClass(OnlyColumnNameReducer.class);
    job.setReducerClass(OnlyColumnNameReducer.class);

Mapper

 public class OnlyColumnNameMapper extends TableMapper<Text, Text> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, final Context context) throws IOException, InterruptedException {
       CellScanner cellScanner = value.cellScanner();
       while (cellScanner.advance()) {

          Cell cell = cellScanner.current();
          byte[] q = Bytes.copy(cell.getQualifierArray(),
                                cell.getQualifierOffset(),
                                cell.getQualifierLength());

          context.write(new Text(q),new Text());  

       }
 }

}

Reducer

public class OnlyColumnNameReducer extends Reducer<Text, Text, Text, Text> {

    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {    
            context.write(new Text(key), new Text());    
    }
}

这篇关于我们可以从HBase表中获取所有列名吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆