了解涉及3个或更多表时JOIN的工作方式. [SQL] [英] Understanding how JOIN works when 3 or more tables are involved. [SQL]

查看:198
本文介绍了了解涉及3个或更多表时JOIN的工作方式. [SQL]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人可以帮助提高我对SQL JOIN的理解. [如果对这个问题很重要,我正在专门考虑MS SQL Server.]

获取3个表A,B [与某A.AId相关的A]和C [B与某B.BId相关的C]]

如果我撰写查询,例如

SELECT *
FROM A JOIN B 
ON A.AId = B.AId

一切都很好-我对它的工作方式很满意.

当表C(或其他D,E,....被添加)时会发生什么

在这种情况下

SELECT *
FROM A JOIN B 
  ON A.AId = B.AId
JOIN C ON C.BId = B.BId

什么是C加入? -是B表(以及B表中的值吗?) 还是将C表联接到A + B联接的结果中得出一些其他临时结果集?

[根据A,B的连接条件,并非B表中的所有值都必然包含在临时结果集A + B中]

我为什么要问的一个特定(且相当人为)的示例是因为我试图理解以下行为:

Tables 
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)

Where:
Account->Opening, and Closing Balances are NULLABLE 
(may have opening balance, closing balance, or none)

Balance->BalanceToken is 1:m - a balance could consist of many tokens

从概念上讲,某个日期的期末余额将是明天的期初余额

如果我要查找一个帐户的所有期初和期末余额的列表

我可能会做类似的事情

SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A 
   LEFT JOIN BALANCE OpeningBal 
      ON A.OpeningBalanceId = OpeningBal.BalanceId
   LEFT JOIN BALANCE ClosingBal 
      ON A.ClosingBalanceId = ClosingBal.BalanceId
   LEFT JOIN BalanceToken openingBalanceAmounts 
      ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
   LEFT JOIN BalanceToken closingBalanceAmounts 
      ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
   GROUP BY AccountId, AccountBalanceDate  

事情如我所料,直到最后一个JOIN引入期末余额令牌-我最终在结果中得到重复.

[我可以使用DISTINCT进行修复-但我试图理解为什么发生了什么事

有人告诉我问题是因为Balance和BalanceToken之间的关系是1:M-并且当我引入最后一个JOIN时,我得到了重复,因为第三个JOIN已经多次将BalanceIds引入到(我假设)是临时结果集.

我知道示例表不符合良好的数据库设计

为这篇文章道歉,谢谢您的启发:)

根据马克的问题进行编辑

从概念上讲,对于一个帐户(每个AccountingDate),BalanceToken中不应存在重复项-我认为问题是由于1个Account/AccountingDates的期末余额是第二天的帐户期初余额-因此,当自我加入时多次使用Balance,BalanceToken获得期初和期末余额我认为Balances(BalanceId's)被多次引入结果组合"中.如果有助于阐明第二个示例,则可以将其视为每日对帐-因此是左联接-对于给定的帐户/会计日期组合,可能尚未计算期初(和/或)期末余额.

解决方案

概念上这是将三个表连接在一起时发生的情况.

  1. 优化器提供了一个计划,其中包括一个连接顺序.可以是A,B,C或C,B,A或任何组合
  2. 查询执行引擎将任何谓词(WHERE子句)应用于不涉及任何其他表的第一个表.它选择JOIN条件或SELECT列表或ORDER BY列表中提到的列.将此结果称为A
  3. 它将结果集连接到第二个表.对于每一行,它会连接到第二个表,并应用可能适用于第二个表的所有谓词.这将导致另一个临时结果集.
  4. 然后将其加入决赛桌并应用ORDER BY

从概念上讲,这是会发生的事情.实际上,在此过程中有许多可能的优化.关系模型的优点是可靠的数学基础使计划的各种转换成为可能,而不会改变正确性.

例如,实际上一路没有必要生成完整的结果集.相反,可以首先使用索引通过访问数据来完成ORDER BY.可以完成很多类型的联接.

I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]

Take 3 tables A, B [A related to be by some A.AId], and C [B related to C by some B.BId]

If I compose a query e.g

SELECT *
FROM A JOIN B 
ON A.AId = B.AId

All good - I'm sweet with how this works.

What happens when Table C (Or some other D,E, .... gets added)

In the situation

SELECT *
FROM A JOIN B 
  ON A.AId = B.AId
JOIN C ON C.BId = B.BId

What is C joining to? - is it that B table (and the values in the B table?) Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?

[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]

A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:

Tables 
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)

Where:
Account->Opening, and Closing Balances are NULLABLE 
(may have opening balance, closing balance, or none)

Balance->BalanceToken is 1:m - a balance could consist of many tokens

Conceptually, Closing Balance of a date, would be tomorrows opening balance

If I was trying to find a list of all the opening and closing balances for an account

I might do something like

SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A 
   LEFT JOIN BALANCE OpeningBal 
      ON A.OpeningBalanceId = OpeningBal.BalanceId
   LEFT JOIN BALANCE ClosingBal 
      ON A.ClosingBalanceId = ClosingBal.BalanceId
   LEFT JOIN BalanceToken openingBalanceAmounts 
      ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
   LEFT JOIN BalanceToken closingBalanceAmounts 
      ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
   GROUP BY AccountId, AccountBalanceDate  

Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.

[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]

I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.

I know that the example tables do not conform to good DB design

Apologies for the essay, thanks for any elightenment :)

Edit in response to question by Marc

Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.

解决方案

Conceptually here is what happens when you join three tables together.

  1. The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
  2. The query execution engine applies any predicates (WHERE clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in the JOIN conditions or the SELECT list or the ORDER BY list. Call this result A
  3. It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
  4. Then it joins in the final table and applies the ORDER BY

This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.

For example, there is really no need to generate the full result sets along the way. The ORDER BY may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.

这篇关于了解涉及3个或更多表时JOIN的工作方式. [SQL]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆