如何跳过缺少特定列的HBase行? [英] How can I skip HBase rows that are missing specific columns?
问题描述
什么我可以使用过滤器跳过没有我需要的列的行吗?
另外,过滤器概念本身有点奇怪。过滤器是以逐行为基础还是以关键值为基础进行操作? 过滤行意味着跳过行或包括它,或简单地通过过滤器?
有什么地方,这比hbase javadocs更清楚地解释HBase的书是解答大量问题的最佳地方:
http://hbase.apache.org/book/client.filter.html
in特别解释了过滤器是如何工作的。
过滤器在服务器端执行非常高效,并减少了通过网络传输的数据量。我同意javadocs真的使包含或排除非明显的语义,但我认为这本书说清楚:过滤器定义了什么必须是真正的才能将行返回到客户端。
扫描也是确定必须返回什么的好方法,但是您在定义扫描时需要小心。如果你定义了一个扫描来包含整个列族在一个api调用,然后在你的代码中,定义一个特定的列限定符要返回,第二个调用将覆盖第一个调用,只有特定的限定符将被返回,没有其他列的限定符将被返回。
I'm writing a mapreduce job over HBase using table mapper. I want to skip rows that don't have specific columns. For example, if the mapper reads from the "meta" family, "source" qualifier column, the mapper should expect something to be in that column. I know I can add columns to the scan object, but I expect this merely limits which rows can be seen by the scan, not which columns need to be there.
What filter can I use to skip rows without the columns I need?
Also, the filter concept itself is a little strange. Does the filter operate on a row-by-row basis or a keyvalue-by-keyvalue basis? Does "filter a row" mean skip the row or include it, or simply put it through a filter?
Is there somewhere where this is explained more clearly than the hbase javadocs?
The HBase book is the best place to answer a large number of questions: http://hbase.apache.org/book/client.filter.html in particular explains how filters work.
Filters are very efficient as they are performed on the server side and reduce the amount of data flowing over the network. I agree that the javadocs really makes the semantics of include or exclude non-obvious, but I think the book makes it clear: Filters define what must be true in order to return the row to the client.
Scans are also a good way to defining what must be returned, although you need to be careful in how you define your scans. If you define a scan to contain a whole column family in one api call, and then later in your code, define a specific column qualifier to be returned, the second call will override the first call and only that specific qualifier will be returned, and no other column qualifier in the column family will be returned.
这篇关于如何跳过缺少特定列的HBase行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!