了解涉及 3 个或更多表时 JOIN 的工作原理.[SQL] [英] Understanding how JOIN works when 3 or more tables are involved. [SQL]
问题描述
我想知道是否有人可以帮助我提高对 SQL 中的 JOIN 的理解.[如果它对问题很重要,我会特别考虑 MS SQL Server.]
I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]
取 3 个表 A、B [A 通过一些 A.AId 与 B 相关] 和 C [B 通过一些 B.BId 与 C 相关]
Take 3 tables A, B [A related to B by some A.AId], and C [B related to C by some B.BId]
如果我编写一个查询,例如
If I compose a query e.g
SELECT *
FROM A JOIN B
ON A.AId = B.AId
一切都很好 - 我对它的工作原理很满意.
All good - I'm sweet with how this works.
当表 C(或其他一些 D、E、.... 被添加)时会发生什么
What happens when Table C (Or some other D,E, .... gets added)
情况
SELECT *
FROM A JOIN B
ON A.AId = B.AId
JOIN C ON C.BId = B.BId
C 加入了什么?- 是那个 B 表(以及其中的值)吗?还是其他一些临时结果集是 C 表所连接的 A+B 连接的结果?
What is C joining to? - is it that B table (and the values therein)? Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?
[这意味着不是所有在 B 表中的值都必须在临时结果集 A+B 中,基于 A,B 的连接条件]
[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]
我之所以问这个问题的一个具体(并且相当人为的)示例是因为我试图理解我在以下内容中看到的行为:
A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:
Tables
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)
Where:
Account->Opening, and Closing Balances are NULLABLE
(may have opening balance, closing balance, or none)
Balance->BalanceToken is 1:m - a balance could consist of many tokens
从概念上讲,一个日期的期末余额,就是明天的期初余额
Conceptually, Closing Balance of a date, would be tomorrows opening balance
如果我试图找到一个账户的所有期初和期末余额的列表
If I was trying to find a list of all the opening and closing balances for an account
我可能会做类似的事情
SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A
LEFT JOIN BALANCE OpeningBal
ON A.OpeningBalanceId = OpeningBal.BalanceId
LEFT JOIN BALANCE ClosingBal
ON A.ClosingBalanceId = ClosingBal.BalanceId
LEFT JOIN BalanceToken openingBalanceAmounts
ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
LEFT JOIN BalanceToken closingBalanceAmounts
ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
GROUP BY AccountId, AccountBalanceDate
事情按照我的预期工作,直到最后一个 JOIN 引入期末余额令牌 - 结果我最终得到重复.
Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.
[我可以用 DISTINCT 修复 - 但我试图理解为什么正在发生的事情正在发生]
[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]
我被告知问题是因为 Balance 和 BalanceToken 之间的关系是 1:M - 当我引入最后一个 JOIN 时,我得到了重复项,因为第三个 JOIN 已经多次引入 BalanceIds (我假设)临时结果集.
I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.
我知道示例表不符合良好的数据库设计
I know that the example tables do not conform to good DB design
为这篇文章道歉,感谢您的启发:)
Apologies for the essay, thanks for any elightenment :)
针对 Marc 的问题进行编辑
Edit in response to question by Marc
从概念上讲,对于一个帐户(每个 AccountingDate),一个帐户不应该在 BalanceToken 中重复 - 我认为问题的产生是因为 1 个帐户/AccountingDates 期末余额是第二天的帐户期初余额 - 所以当自己加入Balance, BalanceToken 多次获得期初和期末余额 我认为 Balances (BalanceId's) 被多次带入结果组合".如果它有助于澄清第二个示例,请将其视为每日对帐 - 因此左联接 - 可能未针对给定帐户/会计日期组合计算期初(和/或)期末余额.
Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.
推荐答案
概念上这里是将三个表连接在一起时会发生的情况.
Conceptually here is what happens when you join three tables together.
- 优化器提出了一个计划,其中包括一个连接顺序.它可以是 A、B、C 或 C、B、A 或任何组合
- 查询执行引擎将任何谓词(
WHERE
子句)应用于不涉及任何其他表的第一个表.它选择出JOIN
条件或SELECT
列表或ORDER BY
列表中提到的列.称这个结果为 A - 它将这个结果集连接到第二个表.对于连接到第二个表的每一行,应用可能适用于第二个表的任何谓词.这会产生另一个临时结果集.
- 然后加入最终表并应用
ORDER BY
- The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
- The query execution engine applies any predicates (
WHERE
clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in theJOIN
conditions or theSELECT
list or theORDER BY
list. Call this result A - It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
- Then it joins in the final table and applies the
ORDER BY
这就是概念上发生的事情.事实上,在此过程中有许多可能的优化.关系模型的优点是,良好的数学基础使计划的各种变换成为可能,同时不改变正确性.
This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.
例如,真的没有必要一路生成完整的结果集.ORDER BY
可以改为通过首先使用索引访问数据来完成.还有很多类型的连接可以完成.
For example, there is really no need to generate the full result sets along the way. The ORDER BY
may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.
这篇关于了解涉及 3 个或更多表时 JOIN 的工作原理.[SQL]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!