我们可以从HBase表中获取所有列名吗? [英] Can we get all the column names from an HBase table?
问题描述
设置:
我有一个HBase表,有100M +行和1百万列。每行只有2到5列的数据。问题:
我想找出所有不同的<$ <$>
在这个列族
中的c $ c> qualifiers (列)。有没有一个快速的方法来做到这一点?
我可以考虑扫描整个表,然后获取 familyMap
对于每一行,得到 qualifier
,并将它添加到 Set<>
中。但那会非常缓慢,因为有100M +行。
我们可以做得更好吗?
您可以为此使用mapreduce。在这种情况下,您不需要为协处理器安装hbase的自定义库。
下面是创建mapreduce任务的代码。
作业设置
作业作业= Job.getInstance(config) ;
job.setJobName(Distinct columns);
扫描扫描=新扫描();
scan.setBatch(500);
scan.addFamily(YOU_COLUMN_FAMILY_NAME);
scan.setFilter(new KeyOnlyFilter()); //只扫描KeyValue的关键部分(raw,列族,列)
scan.setCacheBlocks(false); //不要为MR作业设置为true
TableMapReduceUtil.initTableMapperJob(
YOU_TABLE_NAME,
scan,
OnlyColumnNameMapper.class,// mapper
Text.class,//映射器输出键
Text.class,//映射器输出值
job);
job.setNumReduceTasks(1);
job.setReducerClass(OnlyColumnNameReducer.class);
job.setReducerClass(OnlyColumnNameReducer.class);
Mapper
public class OnlyColumnNameMapper扩展TableMapper< Text,Text> {
@Override
protected void map(ImmutableBytesWritable key,Result value,final Context context)throws IOException,InterruptedException {
CellScanner cellScanner = value.cellScanner();
while(cellScanner.advance()){
Cell cell = cellScanner.current();
byte [] q = Bytes.copy(cell.getQualifierArray(),
cell.getQualifierOffset(),
cell.getQualifierLength());
context.write(new Text(q),new Text());
$ b $ / code $ / pre>
}
Reducer
public class OnlyColumnNameReducer extends Reducer< Text,Text,Text,Text> {
@Override
protected void reduce(Text key,Iterable< Text> values,Context context)throws IOException,InterruptedException {
context.write(new Text(key),新的文本());
}
}
Setup:
I have an HBase table, with 100M+ rows and 1 Million+ columns. Every row has data for only 2 to 5 columns. There is in just 1 Column Family.
Problem:
I want to find out all the distinct qualifiers
(columns) in this column family
. Is there a quick way to do that?
I can think of about scanning the whole table, then getting familyMap
for each row, get qualifier
and add it to a Set<>
. But that would be awfully slow, as there are 100M+ rows.
Can we do any better?
解决方案 You can use a mapreduce for this. In this case you don't need to install a custom libs for hbase as in case for coprocessor.
Below a code for creating a mapreduce task.
Job setup
Job job = Job.getInstance(config);
job.setJobName("Distinct columns");
Scan scan = new Scan();
scan.setBatch(500);
scan.addFamily(YOU_COLUMN_FAMILY_NAME);
scan.setFilter(new KeyOnlyFilter()); //scan only key part of KeyValue (raw, column family, column)
scan.setCacheBlocks(false); // don't set to true for MR jobs
TableMapReduceUtil.initTableMapperJob(
YOU_TABLE_NAME,
scan,
OnlyColumnNameMapper.class, // mapper
Text.class, // mapper output key
Text.class, // mapper output value
job);
job.setNumReduceTasks(1);
job.setReducerClass(OnlyColumnNameReducer.class);
job.setReducerClass(OnlyColumnNameReducer.class);
Mapper
public class OnlyColumnNameMapper extends TableMapper<Text, Text> {
@Override
protected void map(ImmutableBytesWritable key, Result value, final Context context) throws IOException, InterruptedException {
CellScanner cellScanner = value.cellScanner();
while (cellScanner.advance()) {
Cell cell = cellScanner.current();
byte[] q = Bytes.copy(cell.getQualifierArray(),
cell.getQualifierOffset(),
cell.getQualifierLength());
context.write(new Text(q),new Text());
}
}
}
Reducer
public class OnlyColumnNameReducer extends Reducer<Text, Text, Text, Text> {
@Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
context.write(new Text(key), new Text());
}
}
这篇关于我们可以从HBase表中获取所有列名吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!