了解涉及3个或更多表时JOIN的工作方式. [SQL] [英] Understanding how JOIN works when 3 or more tables are involved. [SQL]
问题描述
我想知道是否有人可以帮助提高我对SQL JOIN的理解. [如果对这个问题很重要,我正在专门考虑MS SQL Server.]
获取3个表A,B [与某A.AId相关的A]和C [B与某B.BId相关的C]]
如果我撰写查询,例如
SELECT *
FROM A JOIN B
ON A.AId = B.AId
一切都很好-我对它的工作方式很满意.
当表C(或其他D,E,....被添加)时会发生什么
在这种情况下
SELECT *
FROM A JOIN B
ON A.AId = B.AId
JOIN C ON C.BId = B.BId
什么是C加入? -是B表(以及B表中的值吗?) 还是将C表联接到A + B联接的结果中得出一些其他临时结果集?
[根据A,B的连接条件,并非B表中的所有值都必然包含在临时结果集A + B中]
我为什么要问的一个特定(且相当人为)的示例是因为我试图理解以下行为:
Tables
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)
Where:
Account->Opening, and Closing Balances are NULLABLE
(may have opening balance, closing balance, or none)
Balance->BalanceToken is 1:m - a balance could consist of many tokens
从概念上讲,某个日期的期末余额将是明天的期初余额
如果我要查找一个帐户的所有期初和期末余额的列表
我可能会做类似的事情
SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A
LEFT JOIN BALANCE OpeningBal
ON A.OpeningBalanceId = OpeningBal.BalanceId
LEFT JOIN BALANCE ClosingBal
ON A.ClosingBalanceId = ClosingBal.BalanceId
LEFT JOIN BalanceToken openingBalanceAmounts
ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
LEFT JOIN BalanceToken closingBalanceAmounts
ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
GROUP BY AccountId, AccountBalanceDate
事情如我所料,直到最后一个JOIN引入期末余额令牌-我最终在结果中得到重复.
[我可以使用DISTINCT进行修复-但我试图理解为什么发生了什么事>
有人告诉我问题是因为Balance和BalanceToken之间的关系是1:M-并且当我引入最后一个JOIN时,我得到了重复,因为第三个JOIN已经多次将BalanceIds引入到(我假设)是临时结果集.
我知道示例表不符合良好的数据库设计
为这篇文章道歉,谢谢您的启发:)
根据马克的问题进行编辑
从概念上讲,对于一个帐户(每个AccountingDate),BalanceToken中不应存在重复项-我认为问题是由于1个Account/AccountingDates的期末余额是第二天的帐户期初余额-因此,当自我加入时多次使用Balance,BalanceToken获得期初和期末余额我认为Balances(BalanceId's)被多次引入结果组合"中.如果有助于阐明第二个示例,则可以将其视为每日对帐-因此是左联接-对于给定的帐户/会计日期组合,可能尚未计算期初(和/或)期末余额.
概念上这是将三个表连接在一起时发生的情况.
- 优化器提供了一个计划,其中包括一个连接顺序.可以是A,B,C或C,B,A或任何组合
- 查询执行引擎将任何谓词(
WHERE
子句)应用于不涉及任何其他表的第一个表.它选择JOIN
条件或SELECT
列表或ORDER BY
列表中提到的列.将此结果称为A - 它将结果集连接到第二个表.对于每一行,它会连接到第二个表,并应用可能适用于第二个表的所有谓词.这将导致另一个临时结果集.
- 然后将其加入决赛桌并应用
ORDER BY
从概念上讲,这是会发生的事情.实际上,在此过程中有许多可能的优化.关系模型的优点是可靠的数学基础使计划的各种转换成为可能,而不会改变正确性.
例如,实际上一路没有必要生成完整的结果集.相反,可以首先使用索引通过访问数据来完成ORDER BY
.可以完成很多类型的联接.
I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]
Take 3 tables A, B [A related to be by some A.AId], and C [B related to C by some B.BId]
If I compose a query e.g
SELECT *
FROM A JOIN B
ON A.AId = B.AId
All good - I'm sweet with how this works.
What happens when Table C (Or some other D,E, .... gets added)
In the situation
SELECT *
FROM A JOIN B
ON A.AId = B.AId
JOIN C ON C.BId = B.BId
What is C joining to? - is it that B table (and the values in the B table?) Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?
[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]
A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:
Tables
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)
Where:
Account->Opening, and Closing Balances are NULLABLE
(may have opening balance, closing balance, or none)
Balance->BalanceToken is 1:m - a balance could consist of many tokens
Conceptually, Closing Balance of a date, would be tomorrows opening balance
If I was trying to find a list of all the opening and closing balances for an account
I might do something like
SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A
LEFT JOIN BALANCE OpeningBal
ON A.OpeningBalanceId = OpeningBal.BalanceId
LEFT JOIN BALANCE ClosingBal
ON A.ClosingBalanceId = ClosingBal.BalanceId
LEFT JOIN BalanceToken openingBalanceAmounts
ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
LEFT JOIN BalanceToken closingBalanceAmounts
ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
GROUP BY AccountId, AccountBalanceDate
Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.
[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]
I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.
I know that the example tables do not conform to good DB design
Apologies for the essay, thanks for any elightenment :)
Edit in response to question by Marc
Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.
Conceptually here is what happens when you join three tables together.
- The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
- The query execution engine applies any predicates (
WHERE
clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in theJOIN
conditions or theSELECT
list or theORDER BY
list. Call this result A - It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
- Then it joins in the final table and applies the
ORDER BY
This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.
For example, there is really no need to generate the full result sets along the way. The ORDER BY
may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.
这篇关于了解涉及3个或更多表时JOIN的工作方式. [SQL]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!