特定列的postgresql(redshift)最大值 [英] postgresql (redshift) maximum value for a specific column

查看:73
本文介绍了特定列的postgresql(redshift)最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在进行红移-我有一张类似的桌子

I'm working on redshift - I have a table like

userid  oid version number_of_objects
1       ab  1       10
1       ab  2       20
1       ab  3       17
1       ab  4       16
1       ab  5       14
1       cd  1       5
1       cd  2       6
1       cd  3       9
1       cd  4       12
2       ef  1       4
2       ef  2       3
2       gh  1       16
2       gh  2       12
2       gh  3       21

我想从此表中选择每个oid的最大版本号,并获取userid和行号.

I would like to select from this table the maximum version number for every oid and get the userid and the number of the row.

不幸的是,当我尝试此操作时,我已经把整张桌子拿回去了.

When I tried this, unfortunately I've got the whole table back:

SELECT MAX(version), oid, userid, number_of_objects
FROM table
GROUP BY oid, userid, number_of_objects
LIMIT 10;

但是真正的结果,我正在寻找的是:

But the real result, what I'm looking for would be:

userid  oid MAX(version)    number_of_objects
1       ab  5               14
1       cd  4               12
2       ef  2               3
2       gh  3               21

以某种方式区别对待也不起作用,它说:

Somehow distinct on doesn't work either, it says:

不支持SELECT DISTINCT ON

SELECT DISTINCT ON is not supported

你有什么主意吗?

更新:在此期间,我想出了这种解决方法,但我觉得这不是最聪明的解决方案.这也很慢.但这至少有效.以防万一:

UPDATE: in the meantime I came up with this workaround, but I feel like this is not the smartest solution. It's also very slow. But it works at least. Just in case:

SELECT * FROM table,
   (SELECT MAX(version) as maxversion, oid, userid
    FROM table
    GROUP BY oid, userid
    ) as maxtable
    WHERE  table.oid = maxtable.oid
   AND table.userid = maxtable.userid
   AND table.version = maxtable.version
LIMIT 100;

您有更好的解决方案吗?

Do you have any better solution?

推荐答案

如果redshift确实具有窗口功能,则可以尝试以下操作:

If redshift does have window functions, you might try this:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         max(version) over (partition by oid, userid) as max_version, 
  from the_table
) t
where version = max_version;

我希望这样做比通过group by进行自我连接要快.

I would expect that to be faster than a self join with a group by.

另一种选择是使用row_number()函数:

Another option would be to use the row_number() function:

SELECT * 
FROM (
  select oid, 
         userid, 
         version,
         row_number() over (partition by oid, userid order by version desc) as rn, 
  from the_table
) t
where rn = 1;

使用哪个更取决于个人喜好.性能方面,我认为不会有所不同.

It's more a matter of personal taste which one to use. Performance wise I wouldn't expect a difference.

这篇关于特定列的postgresql(redshift)最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆