计算HBase表中列族的记录数 [英] Count number of records in a column family in an HBase table

查看:1863
本文介绍了计算HBase表中列族的记录数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个HBase shell命令,它将计算指定列系列中的记录数。
我知道我可以运行:

echoscan'table_name'| hbase shell | grep column_family_name | wc -l <​​br>
然而,这将比标准计数命令运行慢得多:

count'table_name',CACHE => 50000(因为使用CACHE => 50000) br>
和更糟 - 它不返回实际的记录数,而是类似于指定列族中单元格的总数(如果我没有错误?)。
我需要某种排序:

count'table_name',CACHE => 50000,{COLUMNS =>'column_family_name'}

I'm looking for an HBase shell command that will count the number of records in a specified column family. I know I can run:
echo "scan 'table_name'" | hbase shell | grep column_family_name | wc -l
however this will run much slower than the standard counting command:
count 'table_name' , CACHE => 50000 (because the use of the CACHE=>50000)
and worse - it doesn't return the real number of records, but something like the total number of cells (if I'm not mistaken?) in the specified column family. I need something of the sort:
count 'table_name' , CACHE => 50000 , {COLUMNS => 'column_family_name'}

提前感谢,

Michael

Thanks in advance,
Michael

推荐答案

这里是我在需要时写的Ruby代码需要。提供适当的意见。它为您提供 HBase shell count_table 命令。第一个参数是表名,第二个是属性数组,与 scan shell命令相同。

Here is Ruby code I have written when needed thing like you need. Appropriate comments are provided. It provides you with HBase shell count_table command. First parameter is table name and second is array of properties, the same as for scan shell command.

您的问题是

count_table 'your.table', { COLUMNS => 'your.family' }

我也建议添加缓存,

I also recommend to add cache, like for scan:

count_table 'your.table', { COLUMNS => 'your.family', CACHE => 10000 }

在这里你可以使用来源:

And here you go with sources:

# Argiments are the same as for scan command.
# Examples:
#
# count_table 'test.table', { COLUMNS => 'f:c1' }
# --- Counts f:c1 columsn in 'test_table'.
#
# count_table 'other.table', { COLUMNS => 'f' }
# --- Counts 'f' family rows in 'other.table'.
#
# count_table 'test.table', { CACHE => 1000 }
# --- Count rows with caching.
#
def count_table(tablename, args = {})

    table = @shell.hbase_table(tablename)

    # Run the scanner
    scanner = table._get_scanner(args)

    count = 0
    iter = scanner.iterator

    # Iterate results
    while iter.hasNext
        row = iter.next
        count += 1
    end

    # Return the counter
    return count
end

这篇关于计算HBase表中列族的记录数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆