从连接限制每组的行数(从非限制为1行) [英] Limit number of rows per group from join (NOT to 1 row)

查看:83
本文介绍了从连接限制每组的行数(从非限制为1行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出这些表:

TABLE Stores (
 store_id INT,
 store_name VARCHAR,
 etc
);

TABLE Employees (
 employee_id INT,
 store_id INT,
 employee_name VARCHAR,
 currently_employed BOOLEAN,
 etc
);

我想列出每家商店的15名最长雇员(假设15名employee_id最低),或者列出某商店的所有雇员(如果有15名currently_employed='t').我想用一个join子句来做到这一点.

I want to list the 15 longest-employed employees for each store (let's say the 15 with lowest employee_id), or ALL employees for a store if there are 15 who are currently_employed='t'. I want to do it with a join clause.

我发现了很多例子,这些人仅连续1次执行此操作,通常是最小或最大(单个最长受雇的员工),但是我想基本上结合使用和联接内的LIMIT.这些示例中的一些可以在这里找到:

I've found a lot of examples of people doing this only for 1 row, usually a min or max (single longest-employed employee), but I want to basically do combine an ORDER BY and a LIMIT inside of the join. Some of those examples can be found here:

  • Limit results from joined table to one row
  • MySQL returning 1 image for each product

我还发现了逐家进行这种存储的不错的示例(我没有,我大约有5000家商店):

I've also found decent examples for doing this store-by-store (I don't, I have about 5000 stores):

我还看到您可以使用TOP代替ORDER BYLIMIT,但是不能用于PostgreSQL.

I've also seen that you can use TOP instead of ORDER BY and LIMIT, but not for PostgreSQL.

我认为两个表之间的join子句不是唯一的(甚至不一定是最好的方法),如果可以通过employees表中不同的store_id工作,那么我想对其他方法持开放态度.以后可以随时加入.

I reckon that a join clause between the two tables isn't the only (or even necessarily best way) to do this, if it's possible to just work by distinct store_id inside of the employees table, so I'd be open to other approaches. Can always join afterwards.

由于我对SQL还是很陌生,所以我希望有任何理论背景或其他说明可以帮助我理解工作原理.

As I'm very new to SQL, I'd like any theory background or additional explanation that can help me understand the principles at work.

推荐答案

row_number()

获取每组前n行的一般解决方案是使用窗口函数row_number():

SELECT *
FROM  (
   SELECT *, row_number() OVER (PARTITION BY store_id ORDER BY employee_id) AS rn
   FROM   employees
   WHERE  currently_employed
   ) e
JOIN   stores s USING (store_id)
WHERE  rn <= 15
ORDER  BY store_id, e.rn;

  • PARTITION BY应该使用store_id,这保证是唯一的(与store_name相对).

    • PARTITION BY should use store_id, which is guaranteed to be unique (as opposed to store_name).

      首先确定employees中的行,然后 加入stores,这更便宜.

      First identify rows in employees, then join to stores, that's cheaper.

      要获取15行,请使用row_number()而不是rank()(为此将使用错误的工具).只要employee_id是唯一的,差异就不会显示.

      To get 15 rows use row_number() not rank() (would be the wrong tool for the purpose). As long as employee_id is unique, the difference doesn't show.

      Postgres 9.3 + 的替代方法,通常在与匹配索引结合使用时效果更好,特别是从大表中检索小选择时与匹配索引结合使用.

      An alternative for Postgres 9.3+ that typically performs better in combination with a matching index, especially when retrieving a small selection from a big table.

      SELECT s.store_name, e.*
      FROM   stores s
      , LATERAL (
         SELECT *  -- or just needed columns
         FROM   employees
         WHERE  store_id = s.store_id
         AND    currently_employed
         ORDER  BY employee_id
         LIMIT  15
         ) e
      -- WHERE ... possibly select only a few stores
      ORDER  BY s.store_name, e.store_id, e.employee_id
      

      理想的索引应该是这样的部分多列索引:

      The perfect index would be a partial multicolumn index like this:

      CREATE INDEX ON employees (store_id, employee_id) WHERE  currently_employed
      

      详细信息取决于问题中缺少的详细信息.相关示例:

      Details depend on missing details in the question. Related example:

      两个版本均排除没有现有员工的商店.有多种解决方法,如果需要的话...

      Both versions exclude stores without current employees. There are ways around this if you need it ...

      这篇关于从连接限制每组的行数(从非限制为1行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆