如何在每个ID组的列中选择最频繁的值? [英] How to select most frequent value in a column per each id group?

查看:80
本文介绍了如何在每个ID组的列中选择最频繁的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SQL中有一个看起来像这样的表:

I have a table in SQL that looks like this:

user_id | data1
0       | 6
0       | 6
0       | 6
0       | 1
0       | 1
0       | 2
1       | 5
1       | 5
1       | 3
1       | 3
1       | 3
1       | 7

我想编写一个查询,该查询返回两列:用户ID的列和列,显示每个ID最常出现的值。在我的示例中,对于user_id 0,最频繁的值为6,对于user_id 1,最频繁的值为3。

I want to write a query that returns two columns: a column for the user id, and a column for what the most frequently occurring value per id is. In my example, for user_id 0, the most frequent value is 6, and for user_id 1, the most frequent value is 3. I would want it to look like below:

user_id | most_frequent_value
0       | 6
1       | 3

我正在使用下面的查询来获取最频繁的值,但是它针对整个表运行并返回整个表(而不是每个ID)的最常用值。我需要在查询中添加什么才能使其返回每个ID的最频繁值?我想我需要使用子查询,但是不确定如何构造它。

I am using the query below to get the most frequent value, but it runs against the whole table and returns the most common value for the whole table instead of for each id. What would I need to add to my query to get it to return the most frequent value for each id? I am thinking I need to use a subquery, but am unsure of how to structure it.

SELECT user_id, data1 AS most_frequent_value
FROM my_table
GROUP BY user_id, data1
ORDER BY COUNT(*) DESC LIMIT 1


推荐答案

如果使用正确的 order by,则与(user_id)的区别是相同的,因为它需要1.行从 user_id分区的数据中提取。 DISTINCT ON 是PostgreSQL的特色。

If you use proper "order by" then distinct on (user_id) make the same work because it takes 1.line from data partitioned by "user_id". DISTINCT ON is specialty of PostgreSQL.

select distinct on (user_id) user_id, most_frequent_value from (
SELECT user_id, data1 AS most_frequent_value, count(*) as _count
FROM my_table
GROUP BY user_id, data1) a
ORDER BY user_id, _count DESC 

这篇关于如何在每个ID组的列中选择最频繁的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆