使用group by查找数组中最常见的元素 [英] Find most common elements in array with a group by

查看:128
本文介绍了使用group by查找数组中最常见的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下结构的行表 name TEXT,favourite_colors TEXT [],group_name INTEGER ,其中每一行都有每个人的收藏夹颜色列表以及人属于。我该如何并返回每个组中最常见的颜色的列表?



int []和& int [] 设置为重叠, int []& int [] 获取交点,然后再进行其他计数和排名?

解决方案

快速和脏:

  SELECT group_name,color,count(*)AS ct 
FROM(
SELECT group_name, unnest(favorite_colors)AS颜色
从tbl
)sub
GROUP BY 1,2
ORDER BY 1,3 DESC;



最好使用 横向联接



在Postgres 9.3或更高版本是一种更简洁的表格:

  SELECT group_name,color,count(*) AS ct 
from tbl t,unnest(t.favorite_colors)AS颜色
GROUP BY 1,2
ORDER BY 1,3 DESC;

上面是



<$ p $的简写p> ...
从tbl t
加入横向unnest(t.favorite_colors)颜色为TRUE
...

与其他任何 INNER JOIN 一样,它会排除没有颜色的行( favorite_colors IS NULL )-与第一个查询一样。



include 结果,改为使用:

 选择组名,颜色,计数(*),如ct 
FROM tbl t
左加入横向unnest(t.favorite_colors)AS颜色为TRUE
GROUP BY 1,2
ORDER BY 1,3 DESC;

在下一步中,您可以轻松汇总每个组的最常见颜色,但是您可以首先需要定义最常用的颜色 ...



最常用的颜色



根据评论,选择颜色>出现次数超过3次。

  SELECT t.group_name,color,count(*)ct ct 
FROM tbl t ,unnest(t.favorite_colors)AS颜色
GROUP BY 1,2
具有count(*)> 3
订购1,3 DESC;

要汇总数组中的顶部颜色(降序排列):

  SELECT group_name,array_agg(color)AS top_colors 
FROM(
SELECT group_name,color
FROM tbl t,unnest( t.favorite_colors)AS color
GROUP BY 1,2
HAVING count(*)> 3
ORDER BY 1,count(*)DESC
)sub
GROUP BY 1;

-> SQLfiddle 演示所有内容。


I have a table of rows with the following structure name TEXT, favorite_colors TEXT[], group_name INTEGER where each row has a list of everyone's favorite colors and the group that person belongs to. How can I GROUP BY group_name and return a list of the most common colors in each group?

Could you do a combination of int[] && int[] to set for overlap, int[] & int[] to get the intersection and then something else to count and rank?

解决方案

Quick and dirty:

SELECT group_name, color, count(*) AS ct
FROM (
   SELECT group_name, unnest(favorite_colors) AS color
   FROM   tbl
   ) sub
GROUP  BY 1,2
ORDER  BY 1,3 DESC;

Better with a LATERAL JOIN

In Postgres 9.3 or later this is the cleaner form:

SELECT group_name, color, count(*) AS ct
FROM   tbl t, unnest(t.favorite_colors) AS color
GROUP  BY 1,2
ORDER  BY 1,3 DESC;

The above is shorthand for

...
FROM tbl t
JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
...

And like with any other INNER JOIN, it would exclude rows without color (favorite_colors IS NULL) - as did the first query.

To include such rows in the result, use instead:

SELECT group_name, color, count(*) AS ct
FROM   tbl t
LEFT   JOIN LATERAL unnest(t.favorite_colors) AS color ON TRUE
GROUP  BY 1,2
ORDER  BY 1,3 DESC;

You can easily aggregate the "most common" colors per group in the next step, but you'd need to define "most common colors" first ...

Most common colors

As per comment, pick colors with > 3 occurrences.

SELECT t.group_name, color, count(*) AS ct
FROM   tbl t, unnest(t.favorite_colors) AS color
GROUP  BY 1,2
HAVING count(*) > 3
ORDER  BY 1,3 DESC;

To aggregate the top colors in an array (in descending order):

SELECT group_name, array_agg(color) AS top_colors
FROM  (
   SELECT group_name, color
   FROM   tbl t, unnest(t.favorite_colors) AS color
   GROUP  BY 1,2
   HAVING count(*) > 3
   ORDER  BY 1, count(*) DESC
   ) sub
GROUP BY 1;

-> SQLfiddle demonstrating all.

这篇关于使用group by查找数组中最常见的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆