SQL连接:选择一对多关系中的最后记录 [英] SQL join: selecting the last records in a one-to-many relationship
问题描述
假设我有一张顾客表和一张购买表。每次购买都属于一个客户。我想在一个SELECT语句中获取所有客户的列表以及他们上次购买的列表。什么是最佳做法?有关构建索引的建议吗?
Suppose I have a table of customers and a table of purchases. Each purchase belongs to one customer. I want to get a list of all customers along with their last purchase in one SELECT statement. What is the best practice? Any advice on building indexes?
请在答案中使用这些表/列名称:
Please use these table/column names in your answer:
- 客户:id,名称
- 购买:id,customer_id,item_id,日期
在更复杂的情况下,通过将最后一次购买放入客户表来对数据库进行非规范化是否有利于(性能方面)?
And in more complicated situations, would it be (performance-wise) beneficial to denormalize the database by putting the last purchase into the customer table?
如果(购买) )id保证按日期排序,是否可以通过使用类似 LIMIT 1
?
If the (purchase) id is guaranteed to be sorted by date, can the statements be simplified by using something like LIMIT 1
?
推荐答案
这是在StackOverflow上定期出现的最大-n-group-group
问题的一个例子。
This is an example of the greatest-n-per-group
problem that has appeared regularly on StackOverflow.
以下是我通常建议解决的方法:
Here's how I usually recommend solving it:
SELECT c.*, p1.*
FROM customer c
JOIN purchase p1 ON (c.id = p1.customer_id)
LEFT OUTER JOIN purchase p2 ON (c.id = p2.customer_id AND
(p1.date < p2.date OR p1.date = p2.date AND p1.id < p2.id))
WHERE p2.id IS NULL;
说明:给定一行 p1
,那里应该是没有行 p2
与同一客户和更晚的日期(或者在关系的情况下,后来的 id
)。当我们发现这是真的时, p1
是该客户的最近一次购买。
Explanation: given a row p1
, there should be no row p2
with the same customer and a later date (or in the case of ties, a later id
). When we find that to be true, then p1
is the most recent purchase for that customer.
关于索引,我将在购买
的列上创建一个复合索引( customer_id
, date
, id
)。这可以允许使用覆盖索引来完成外连接。请务必在您的平台上进行测试,因为优化与实现有关。使用RDBMS的功能来分析优化计划。例如。 MySQL上的 EXPLAIN
。
Regarding indexes, I'd create a compound index in purchase
over the columns (customer_id
, date
, id
). That may allow the outer join to be done using a covering index. Be sure to test on your platform, because optimization is implementation-dependent. Use the features of your RDBMS to analyze the optimization plan. E.g. EXPLAIN
on MySQL.
有些人使用子查询代替解决方案我在上面展示,但我发现我的解决方案可以更容易地解决关系。
Some people use subqueries instead of the solution I show above, but I find my solution makes it easier to resolve ties.
这篇关于SQL连接:选择一对多关系中的最后记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!