从连接限制每组的行数(从非限制为1行) [英] Limit number of rows per group from join (NOT to 1 row)
问题描述
给出这些表:
TABLE Stores (
store_id INT,
store_name VARCHAR,
etc
);
TABLE Employees (
employee_id INT,
store_id INT,
employee_name VARCHAR,
currently_employed BOOLEAN,
etc
);
我想列出每家商店的15名最长雇员(假设15名employee_id
最低),或者列出某商店的所有雇员(如果有15名currently_employed='t'
).我想用一个join子句来做到这一点.
I want to list the 15 longest-employed employees for each store (let's say the 15 with lowest employee_id
), or ALL employees for a store if there are 15 who are currently_employed='t'
. I want to do it with a join clause.
我发现了很多例子,这些人仅连续1次仅执行此操作,通常是最小或最大(单个最长受雇的员工),但是我想基本上结合使用LIMIT
.这些示例中的一些可以在这里找到:
I've found a lot of examples of people doing this only for 1 row, usually a min or max (single longest-employed employee), but I want to basically do combine an ORDER BY
and a LIMIT
inside of the join. Some of those examples can be found here:
- Limit results from joined table to one row
- MySQL returning 1 image for each product
我还发现了逐家进行这种存储的不错的示例(我没有,我大约有5000家商店):
I've also found decent examples for doing this store-by-store (I don't, I have about 5000 stores):
我还看到您可以使用TOP
代替ORDER BY
和LIMIT
,但是不能用于PostgreSQL.
I've also seen that you can use TOP
instead of ORDER BY
and LIMIT
, but not for PostgreSQL.
我认为两个表之间的join子句不是唯一的(甚至不一定是最好的方法),如果可以通过employees表中不同的store_id
工作,那么我想对其他方法持开放态度.以后可以随时加入.
I reckon that a join clause between the two tables isn't the only (or even necessarily best way) to do this, if it's possible to just work by distinct store_id
inside of the employees table, so I'd be open to other approaches. Can always join afterwards.
由于我对SQL还是很陌生,所以我希望有任何理论背景或其他说明可以帮助我理解工作原理.
As I'm very new to SQL, I'd like any theory background or additional explanation that can help me understand the principles at work.
推荐答案
row_number()
获取每组前n行的一般解决方案是使用窗口函数row_number()
:
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY store_id ORDER BY employee_id) AS rn
FROM employees
WHERE currently_employed
) e
JOIN stores s USING (store_id)
WHERE rn <= 15
ORDER BY store_id, e.rn;
-
PARTITION BY
应该使用store_id
,这保证是唯一的(与store_name
相对).PARTITION BY
should usestore_id
, which is guaranteed to be unique (as opposed tostore_name
).首先确定
employees
中的行,然后 加入stores
,这更便宜.First identify rows in
employees
, then join tostores
, that's cheaper.要获取15行,请使用
row_number()
而不是rank()
(为此将使用错误的工具).只要employee_id
是唯一的,差异就不会显示.To get 15 rows use
row_number()
notrank()
(would be the wrong tool for the purpose). As long asemployee_id
is unique, the difference doesn't show.Postgres 9.3 + 的替代方法,通常在与匹配索引结合使用时效果更好,特别是从大表中检索小选择时与匹配索引结合使用.
An alternative for Postgres 9.3+ that typically performs better in combination with a matching index, especially when retrieving a small selection from a big table.
SELECT s.store_name, e.* FROM stores s , LATERAL ( SELECT * -- or just needed columns FROM employees WHERE store_id = s.store_id AND currently_employed ORDER BY employee_id LIMIT 15 ) e -- WHERE ... possibly select only a few stores ORDER BY s.store_name, e.store_id, e.employee_id
理想的索引应该是这样的部分多列索引:
The perfect index would be a partial multicolumn index like this:
CREATE INDEX ON employees (store_id, employee_id) WHERE currently_employed
详细信息取决于问题中缺少的详细信息.相关示例:
Details depend on missing details in the question. Related example:
两个版本均排除没有现有员工的商店.有多种解决方法,如果需要的话...
Both versions exclude stores without current employees. There are ways around this if you need it ...
这篇关于从连接限制每组的行数(从非限制为1行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!