按重复属性分组 [英] Group by repeating attribute

查看：76 发布时间：2020/5/28 18:57:30 sql ruby-on-rails postgresql plpgsql window-functions

本文介绍了按重复属性分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

基本上，我有一个表messages，其中的user_id字段标识创建消息的用户.

Basically I have a table messages, with user_id field that identifies a user that created the message.

当我显示两个用户之间的对话(一组消息)时，我希望能够按user_id分组消息，但是要采用一种棘手的方式:

When I display a conversation(set of messages) between two users, I want to be able to group the messages by user_id, but in a tricky way:

假设有一些消息(按created_at desc排序):

Let's say there are some messages (sorted by created_at desc):

  id: 1, user_id: 1
  id: 2, user_id: 1
  id: 3, user_id: 2
  id: 4, user_id: 2
  id: 5, user_id: 1

我想按以下顺序获得3个消息组: [1,2], [3,4], [5]

I want to get 3 message groups in the below order: [1,2], [3,4], [5]

它应该按* user_id *分组，直到看到一个不同的分组，然后再按该分组.

It should group by *user_id* until it sees a different one and then groups by that one.

我正在使用PostgreSQL，并且很乐意使用特定于它的东西，只要能提供最佳性能.

I'm using PostgreSQL and would be happy to use something specific to it, whatever would give the best performance.

正确的SQL

@Igor 提供了一种很好的带有窗口函数的纯SQL技术.
但是:

Proper SQL

@Igor presents a nice pure-SQL technique with window functions.
However:

我想按以下顺序获得3个消息组:[1,2]，[3,4]，[5]

I want to get 3 message groups in the below order: [1,2], [3,4], [5]

要获取请求的订单，请添加ORDER BY min(id):

To get the requested order, add ORDER BY min(id):

SELECT array_agg(id) AS ids
FROM (
   SELECT id
         ,user_id
         ,row_number() OVER (ORDER BY id) -
          row_number() OVER (PARTITION BY user_id ORDER BY id) AS grp
   FROM   messages
   ORDER  BY id) t   -- for ordered arrays in result
GROUP  BY grp, user_id
ORDER  BY min(id);

SQL提琴.

增加的内容几乎不能保证另一个答案.更重要的问题是:

The addition would barely warrant another answer. The more important issue is this:

我正在使用PostgreSQL，并且很乐意使用特定于PostgreSQL的东西，无论它会提供最佳性能.

纯SQL都很好看，但程序上的服务器端功能在此任务上的执行速度更快.虽然程序上处理行通常比慢，但 plpgsql 赢得了竞争，因为它可以通过单表扫描和单 ORDER BY操作:

Pure SQL is all nice and shiny, but a procedural server-side function is much faster for this task. While processing rows procedurally is generally slower, plpgsql wins this competition big-time, because it can make do with a single table scan and a single ORDER BY operation:

CREATE OR REPLACE FUNCTION f_msg_groups()
  RETURNS TABLE (ids int[]) AS
$func$
DECLARE
   _id    int;
   _uid   int;
   _id0   int;                         -- id of last row
   _uid0  int;                         -- user_id of last row
BEGIN
   FOR _id, _uid IN
       SELECT id, user_id FROM messages ORDER BY id
   LOOP
       IF _uid <> _uid0 THEN
          RETURN QUERY VALUES (ids);   -- output row (never happens after 1 row)
          ids := ARRAY[_id];           -- start new array
       ELSE
          ids := ids || _id;           -- add to array
       END IF;

       _id0  := _id;
       _uid0 := _uid;                  -- remember last row
   END LOOP;

   RETURN QUERY VALUES (ids);          -- output last iteration
END
$func$ LANGUAGE plpgsql;

致电:

SELECT * FROM f_msg_groups();

基准和链接

我在具有6万行的类似真实表上使用EXPLAIN ANALYZE进行了快速测试(执行几次，选择最快的结果以排除兑现影响):

Benchmark and links

I ran a quick test with EXPLAIN ANALYZE on a similar real life table with 60k rows (execute several times, pick fastest result to exclude cashing effects):

SQL:
总运行时间:1009.549毫秒
pl/pgSQL:
总运行时间: 336.971毫秒

SQL:
Total runtime: 1009.549 ms
Pl/pgSQL:
Total runtime: 336.971 ms

还要考虑以下密切相关的问题:

Also consider these closely related questions:

GROUP BY and aggregate sequential numeric values
GROUP BY consecutive dates delimited by gaps
Ordered count of consecutive repeats / duplicates

这篇关于按重复属性分组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按重复属性分组 [英] Group by repeating attribute

问题描述

推荐答案

正确的SQL

Proper SQL

基准和链接

Benchmark and links

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

按重复属性分组 [英] Group by repeating attribute

问题描述

推荐答案

正确的SQL

Proper SQL

基准和链接

Benchmark and links

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭