散列完整外部联接如何工作? [英] How does a hash full outer join work?

查看:119
本文介绍了散列完整外部联接如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道哈希左外部联接的算法是在右表上构建哈希表,然后遍历左表并在哈希表中搜索是否存在匹配项,但是完全外部联接如何工作?在浏览完左表中的值之后,您仍然需要一种方法来获取右表中没有左侧匹配项的元组.

解决方案

在循环探查记录时,您记录了哪些右元组在构建表中找到了匹配项.您只需为每个匹配的布尔值设置为true即可.作为算法的最后一步,您将扫描构建表并输出所有先前不匹配的元组.

据我所知,还有一种替代策略在RDBMS中未使用:构建左右元组的组合哈希表.将该表视为从哈希键到左元组列表再加上右元组列表的映射.通过遍历两个输入表来构建该表,将所有元组添加到哈希表中.在消耗完所有元组之后,对哈希表进行一次迭代,并相应地输出相等组(相等组中的所有左元组或所有右元组或所有左元组和所有右元组的叉积).

后一种算法非常适合内存中的工作负载(例如在客户端应用程序中).前者适用于非常大(或无法预测)的大型探针输入,因此RDBMS会使用该输入.

I know the algorithm for a hash left outer join is to build a hashtable on the right table and then loop through the left table and search in the hashtable for if there is a match, but how does a full outer join work? After you scan through the values in the left table you would still need a way to get the tuples in the right table that didn't have matches in the left.

解决方案

While looping through the probe records you record which right tuples have found a match in the build table. You just set a boolean to true for each one that matched. As a final pass in the algorithm you scan the build table and output all tuples that did not match previously.

There is an alternate strategy which is not used in RDBMS's as far as I'm aware: Build a combined hash table of left and right tuples. Treat that table as a map from hash key to a list of left tuples plus a list of right tuples. Build that table by looping through both input tables adding all tuples to the hash table. After all tuples have been consumed iterate over the hash table once and output the equality groups accordingly (either all left-tuples or all right-tuples or a cross-product of all left and all right tuples in the equality group).

The latter algorithm is nice for in-memory workloads (like in client applications). The former is good for an extremely (or unpredictably) large probe input so RDBMS's use that one.

这篇关于散列完整外部联接如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆