根据列总和值对行进行分组 [英] Group rows based on column sum value

查看:97
本文介绍了根据列总和值对行进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含3列的表格,如下所示:

I have a table with 3 columns as shown below:

id   | num_rows                         id   | num_rows | group_id
-----|---------                         -----|----------|--------
2502 | 330                              2502 | 330      | 9
3972 | 150                              3972 | 150      | 9
3988 | 200          =============>      3988 | 200      | 8
4228 | 280          Desired output      4228 | 280      | 8
3971 | 510          =============>      3971 | 510      | 1
52   | 1990                             52   | 1990     | 2
895  | 2000                             895  | 2000     | 3
812  | 5596                             812  | 5596     | 4
1600 | 7462                             1600 | 7462     | 5
910  | 7526                             910  | 7526     | 6
638  | 11569                            638  | 11569    | 7

id是某事物的唯一标识符,而num_rows对应于每个id在另一个表中的行数.

id is a unique identifier for something while num_rows correspond to the number of rows each id has in another table.

我想对行(即id列)进行分组,以使num_rows的总和永远不会超过指定值(在这种情况下,假设500).

I would like to group the rows (i.e., id column) such that the sum of num_rows is never above a specified value (in this case lets say 500).

简而言之:我想将id分组到存储桶中,没有存储桶中的行多于500.如果id大于限制,则它将获得其自己的单独的组/存储桶.

Simply put: I want to group the id's in buckets with no bucket have rows more than 500. If an id is bigger than the limit then it gets its own separate group/bucket.

到目前为止,我已经能够使用以下查询将较大的id分离出来,但是我无法为id的其余子集创建组.

So far, I have been able to separate out the larger id's using the following query but I am not able to create groups for the remaining subset of the id's.

SELECT id, 
        num_rows,
        SUM(CASE WHEN num_rows > 500 THEN 1 ELSE 0 END) OVER(ORDER BY num_rows) AS group_id
FROM myTable;

id   | num_rows | group_id
-----|----------|--------
2502 | 330      | 0
3972 | 150      | 0
3988 | 200      | 0
4228 | 280      | 0
3971 | 510      | 1
52   | 1990     | 2
895  | 2000     | 3
812  | 5596     | 4
1600 | 7462     | 5
910  | 7526     | 6
638  | 11569    | 7

谢谢.

推荐答案

我个人更喜欢使用pl/sql函数来完成此任务,但是如果您要在纯sql中执行此操作,则可以使用以下查询:

I personally would prefere a pl/sql function for this task, but if you want to do it in pure sql you can use the following query:

WITH ord AS (SELECT id, num_rows, ROWNUM ord FROM myTable)
   , rek(ord, id, num_rows, sum_rows, groupId) AS 
         (SELECT ord, id, num_rows, num_rows, 1 FROM ord WHERE ord = 1
          UNION ALL
          SELECT rek.ord +1
               , ord.id
               , ord.num_rows
               , CASE WHEN rek.sum_rows + ord.num_rows > 500
                      THEN ord.num_rows
                      ELSE rek.num_rows + ord.num_rows
                END
               , CASE WHEN rek.sum_rows + ord.num_rows > 500
                      THEN rek.groupID + 1
                      ELSE rek.groupID
                 END
            FROM rek
            JOIN ORD
              ON ord.ord = rek.ord+1)
SELECT id, num_rows, groupid
  FROM rek
/

请注意,此查询不会搜索要构建组的匹配条目,以使总和为<. 500,因为这与所谓的背包问题(s. https://en.wikipedia.org /wiki/Knapsack_problem ),这几乎很容易解决...

Note that this query does not search for matching entries to build groups such that the sum is < 500 as this is closely related to the so called knapsack problem (s. https://en.wikipedia.org/wiki/Knapsack_problem), which is all but easy to solve...

这篇关于根据列总和值对行进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆