Postgres，table1左联接table2，table1中每个ID仅包含1行 [英] Postgres, table1 left join table2 with only 1 row per ID in table1

查看：115 发布时间：2020/5/29 23:47:34 sql postgresql greatest-n-per-group

本文介绍了Postgres，table1左联接table2，table1中每个ID仅包含1行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好，所以标题有点复杂。这基本上是每组最大的n型问题，但是我一生都无法解决。

Ok, so the title is a bit convoluted. This is basically a greatest-n-per-group type problem, but I can't for the life of me figure it out.

我有一个表，user_stats：

I have a table, user_stats:

------------------+---------+---------------------------------------------------------
 id               | bigint  | not null default nextval('user_stats_id_seq'::regclass)
 user_id          | bigint  | not null
 datestamp        | integer | not null
 post_count       | integer | 
 friends_count    | integer | 
 favourites_count | integer |  
Indexes:
    "user_stats_pk" PRIMARY KEY, btree (id)
    "user_stats_datestamp_index" btree (datestamp)
    "user_stats_user_id_index" btree (user_id)
Foreign-key constraints:
    "user_user_stats_fk" FOREIGN KEY (user_id) REFERENCES user_info(id)

我想通过最新的日期戳获取每个id的统计信息。这是一张很大的表，位于4100万行附近，因此我使用以下命令创建了一个user_id，last_date临时表：

I want to get the stats for each id by latest datestamp. This is a biggish table, somewhere in the neighborhood of 41m rows, so I've created a temp table of user_id, last_date using:

CREATE TEMP TABLE id_max_date AS
    (SELECT user_id, MAX(datestamp) AS date FROM user_stats GROUP BY user_id);

问题在于日期戳并不是唯一的，因为一天中可能有1个以上的统计信息更新（本来应该是一个真实的时间戳，但设计此时间戳的人实在是个白痴，目前有太多数据无法回溯）。因此，当我加入JOIN时，某些ID会有多行：

The problem is that datestamp isn't unique since there can be more than 1 stat update in a day (should have been a real timestamp but the guy who designed this was kind of an idiot and theres too much data to go back at the moment). So some IDs have multiple rows when I do the JOIN:

SELECT user_stats.user_id, user_stats.datestamp, user_stats.post_count,
       user_stats.friends_count, user_stats.favorites_count
  FROM id_max_date JOIN user_stats
    ON id_max_date.user_id=user_stats.user_id AND date=datestamp;

如果我将其作为子选择，我想我可以限制1，但我一直都听到效率极低。有想法吗？

If I was doing this as subselects I guess I could LIMIT 1, but I've always heard those are horribly inefficient. Thoughts?

Postgres，table1左联接table2，table1中每个ID仅包含1行 [英] Postgres, table1 left join table2 with only 1 row per ID in table1

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Postgres，table1左联接table2，table1中每个ID仅包含1行 [英] Postgres, table1 left join table2 with only 1 row per ID in table1

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭