如何在hbase中连接表 [英] how to join tables in hbase

查看：226 发布时间：2018/6/5 13:11:32 mapreduce hbase

本文介绍了如何在hbase中连接表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我必须在Hbase中加入表格。

我整合了HIVE和HBase，运作良好。我可以使用HIVE进行查询。

但是有人可以帮助我如何在不使用HIVE的情况下在HBase中连接表。我认为使用mapreduce我们可以实现这一点，如果有的话，任何人都可以共享一个我可以参考的工作示例。

请分享您的意见。

我有一个想法。也就是说，

如果我需要加入表格A x B x C;
我可以使用TableMapReduceUtil遍历A，然后在TableMapper内部从B和C获取数据。然后使用TableReducer写回到另一个表Y.

这种方法是否合适？

解决方案

这当然是一种方法，但是如果您对每个扫描行执行2次随机读取，那么您的速度将会下降。如果您正在过滤显着的行或者在A中有一个小数据集，这可能不是问题。

排序合并加入

然而，HBase 0.96中最好的方法是MultipleTableInput方法。这意味着它将扫描表A并使用唯一键写入输出，这将允许表B匹配。

例如，表A发出（b_id，a_info），表B将发出（b_id，b_info）在reducer中合并在一起。

这是sort-merge连接的一个例子。

嵌套循环连接

如果您加入行键或连接属性按行排序在表B中，您可以在每个任务中拥有一个扫描程序实例，该实例从表B中顺序读取，直到找到所需内容。

例如表A行键=companyId和表B行键=companyId_employeeId。然后，对于表A中的每个公司，您都可以使用嵌套循环算法获得所有员工。

$ b
伪代码：

for（公司在TableA中）：（TableB中的员工）：如果employee.company_id == company.id： emit（company .id，employee）
这是一个嵌套循环连接的例子。

更详细的连接算法在这里：

http://en.wikipedia.org/wiki/Nested_loop_join

http://en.wikipedia.org/wiki/Hash_join

http://en.wikipedia.org/wiki/Sort-merge_join

I have to join tables in Hbase.

I integrated HIVE and HBase and that is working well. I can query using HIVE.

But can somebody help me how to join tables in HBase without using HIVE. I think using mapreduce we can achieve this, if so can anybody share a working example that I can refer.

Please share your opinions.

I have an approach in mind. That is,

If I need to JOIN tables A x B x C; I may use TableMapReduceUtil to iterate over A, then get Data from B and C inside the TableMapper. Then use the TableReducer to write back to another table Y.

Will this approach be a good one.
解决方案
That is certainly an approach, but if you are doing 2 random reads per scanned row then your speed will plummet. If you are filtering the rows out significantly or have a small dataset in A that may not be an issue.

Sort-merge Join

However the best approach, which will be available in HBase 0.96, is the MultipleTableInput method. This means that it will scan table A and write it's output with a unique key that will allow table B to match up.

E.g. Table A emits (b_id, a_info) and Table B will emit (b_id, b_info) merging together in the reducer.

This is an example of a sort-merge join.

Nested-Loop Join

If you are joining on the row key or the joining attribute is sorted in line with table B, you can have a instance of a scanner in each task which sequentially reads from table B until it finds what it's looking for.

E.g. Table A row key = "companyId" and Table B row key = "companyId_employeeId". Then for each Company in Table A you can get all the employees using the nest-loop algorithm.

Pseudocode:

for(company in TableA): for(employee in TableB): if employee.company_id == company.id: emit(company.id, employee)
This is an example of a nest-loop join.

More detailed join algorithms are here:

http://en.wikipedia.org/wiki/Nested_loop_join

http://en.wikipedia.org/wiki/Hash_join

http://en.wikipedia.org/wiki/Sort-merge_join

这篇关于如何在hbase中连接表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在hbase中连接表 [英] how to join tables in hbase

问题描述

排序合并加入

嵌套循环连接

伪代码：

Sort-merge Join

Nested-Loop Join

Pseudocode:

More detailed join algorithms are here:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在hbase中连接表 [英] how to join tables in hbase

问题描述

排序合并加入

嵌套循环连接

伪代码：

Sort-merge Join

Nested-Loop Join

Pseudocode:

More detailed join algorithms are here:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭