我们可以从HBase表中获取所有列名吗？ [英] Can we get all the column names from an HBase table?

查看：2765 发布时间：2018/5/31 19:20:15 hadoop hbase

本文介绍了我们可以从HBase表中获取所有列名吗？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

设置：

我有一个HBase表，有100M +行和1百万列。每行只有2到5列的数据。问题：

我想找出所有不同的<$ <$>

在这个列族中的c $ c> qualifiers （列）。有没有一个快速的方法来做到这一点？

我可以考虑扫描整个表，然后获取 familyMap 对于每一行，得到 qualifier ，并将它添加到 Set<> 中。但那会非常缓慢，因为有100M +行。

我们可以做得更好吗？

解决方案

您可以为此使用mapreduce。在这种情况下，您不需要为协处理器安装hbase的自定义库。
下面是创建mapreduce任务的代码。

作业设置

 作业作业= Job.getInstance（config） ; 
 job.setJobName（Distinct columns）; 
 
扫描扫描=新扫描（）; 
 scan.setBatch（500）; 
 scan.addFamily（YOU_COLUMN_FAMILY_NAME）; 
 scan.setFilter（new KeyOnlyFilter（））; //只扫描KeyValue的关键部分（raw，列族，列）
 scan.setCacheBlocks（false）; //不要为MR作业设置为true 
 
 
 TableMapReduceUtil.initTableMapperJob（
 YOU_TABLE_NAME，
 scan，
 OnlyColumnNameMapper.class，// mapper 
 Text.class，//映射器输出键
 Text.class，//映射器输出值
 job）; 
 
 job.setNumReduceTasks（1）; 
 job.setReducerClass（OnlyColumnNameReducer.class）; 
 job.setReducerClass（OnlyColumnNameReducer.class）;

Mapper

  public class OnlyColumnNameMapper扩展TableMapper< Text，Text> {
 @Override 
 protected void map（ImmutableBytesWritable key，Result value，final Context context）throws IOException，InterruptedException {
 CellScanner cellScanner = value.cellScanner（）; 
 while（cellScanner.advance（））{
 
 Cell cell = cellScanner.current（）; 
 byte [] q = Bytes.copy（cell.getQualifierArray（），
 cell.getQualifierOffset（），
 cell.getQualifierLength（））; 
 
 context.write（new Text（q），new Text（））; 
 
 
 $ b $ / code $ / pre> 
 
 } 
 
 
  Reducer  
 
 
  public class OnlyColumnNameReducer extends Reducer< Text，Text，Text，Text> {
 
 @Override 
 protected void reduce（Text key，Iterable< Text> values，Context context）throws IOException，InterruptedException {
 context.write（new Text（key），新的文本（））; 
} 
} 
  
 
Setup:

I have an HBase table, with 100M+ rows and 1 Million+ columns. Every row has data for only 2 to 5 columns. There is in just 1 Column Family.

Problem:

I want to find out all the distinct qualifiers (columns) in this column family. Is there a quick way to do that?

I can think of about scanning the whole table, then getting familyMap for each row, get qualifier and add it to a Set<>. But that would be awfully  slow, as there are 100M+ rows.

Can we do any better?
 解决方案 
You can use a mapreduce for this. In this case you don't need to install a custom libs for hbase as in case for coprocessor.
Below a code for creating a mapreduce task.  

Job setup 
    Job job = Job.getInstance(config);
    job.setJobName("Distinct columns");

    Scan scan = new Scan();
    scan.setBatch(500);
    scan.addFamily(YOU_COLUMN_FAMILY_NAME);
    scan.setFilter(new KeyOnlyFilter()); //scan only key part of KeyValue (raw, column family, column)
    scan.setCacheBlocks(false);  // don't set to true for MR jobs


    TableMapReduceUtil.initTableMapperJob(
            YOU_TABLE_NAME,
            scan,          
            OnlyColumnNameMapper.class,   // mapper
            Text.class,             // mapper output key
            Text.class,             // mapper output value
            job);

    job.setNumReduceTasks(1);
    job.setReducerClass(OnlyColumnNameReducer.class);
    job.setReducerClass(OnlyColumnNameReducer.class);
Mapper
 public class OnlyColumnNameMapper extends TableMapper<Text, Text> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, final Context context) throws IOException, InterruptedException {
       CellScanner cellScanner = value.cellScanner();
       while (cellScanner.advance()) {

          Cell cell = cellScanner.current();
          byte[] q = Bytes.copy(cell.getQualifierArray(),
                                cell.getQualifierOffset(),
                                cell.getQualifierLength());

          context.write(new Text(q),new Text());  

       }
 }
}

Reducer
public class OnlyColumnNameReducer extends Reducer<Text, Text, Text, Text> {

    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {    
            context.write(new Text(key), new Text());    
    }
}


                        
这篇关于我们可以从HBase表中获取所有列名吗？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

我们可以从HBase表中获取所有列名吗？ [英] Can we get all the column names from an HBase table?

问题描述

设置：

Setup:

Problem:

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

我们可以从HBase表中获取所有列名吗？ [英] Can we get all the column names from an HBase table?

问题描述

设置：

Setup:

Problem:

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭