按MySQL中的前N个分组 [英] Grouping by a Top N in MySQL
问题描述
关于stackoverflow的问题很多,但我似乎找不到与我所遇到的情况相匹配的问题.我想在前n个查询中执行一些分组.我的数据看起来像这样(显然是假值).
There are a lot of SQL Top N questions on stackoverflow but I can't seem to find one that matches the situation I'm having. I would like to perform some grouping within a top n query. My data looks like this (obviously with fake values).
MY_DATE IP_ADDRESS
1/1/09 999.999.999.999
1/1/09 999.999.999.999
1/1/09 999.999.999.998
... a lot more rows
该表的日期范围涵盖数月,每月有数千行.我想做的是一个查询,告诉我哪个IP地址出现在每个月中最频繁.我可以使用以下方法完成一个月的操作:
The date range for the table covers several months and has many thousands of rows per month. What I would like to do is have a single query tell me which 10 IP Addresses occurred the most frequently for each month. I can do this for a single month using the following:
SELECT DATE_FORMAT(MY_DATE, '%b-%y') AS "MONTH", IP_ADDRESS, COUNT(*) AS HITS
FROM MY_DATA
WHERE DATE_FORMAT(MY_DATE, '%b-%y') = 'JAN-09'
GROUP BY DATE_FORMAT(MY_DATE, '%b-%y'), IP_ADDRESS
ORDER BY HITS DESC
LIMIT 10
但是我真正想要的是能够看到数据集中每个月的前n个.从本质上讲,这禁止了我使用我指定的where子句.当然,当我这样做时,我将所有月份的总分提高到10.我正在寻找的结果应如下所示:
But what I really want is to be able to see the top n for every month in the data set. That essentially prohibits me from using the where clause I specified. Of course, when I do that, then I just get the to 10 for all months. The result I'm looking for should look like this:
MONTH IP_ADDRESS COUNT(*)
JAN-09 999.999.999.999 200
JAN-09 999.999.999.998 150
... ( 8 more rows of January )
FEB-09 999.999.999.999 320
FEB-09 999.999.999.998 234
... ( 8 more rows of February)
MAR-09 999.999.999.999 440
... ETC.
这可以在MySQL中完成吗?我遇到的障碍似乎是MySQL不允许UNION中包含的查询语句中的ORDER BY.感谢您的帮助!
Can this be done in MySQL? It seems the barrier I'm hitting is that MySQL doesn't allow an ORDER BY within a query statement included in a UNION. Thanks for the help!
推荐答案
I just tried a query very similar to the one given by @Charles Bretana and it does work. I used a VIEW to help clarify things.
CREATE TABLE my_data (
my_date DATE,
ip_address CHAR(15)
);
插入一堆日期/IP地址对(未显示)...
Insert a bunch of date/IPaddress pairs (not shown)...
为每月所有计数和IP地址创建一个视图:
Create a view for all counts per month and IP address:
CREATE VIEW my_data_per_month as
SELECT EXTRACT(YEAR_MONTH FROM my_date) AS month,
ip_address, COUNT(*) AS hits
FROM my_data
GROUP BY month, ip_address;
SELECT * FROM my_data_per_month
ORDER BY month ASC, hits DESC;
+--------+-----------------+------+
| month | ip_address | hits |
+--------+-----------------+------+
| 200901 | 999.999.999.999 | 8 |
| 200901 | 999.999.999.998 | 6 |
| 200901 | 999.999.999.997 | 5 |
| 200901 | 999.999.999.996 | 4 |
| 200901 | 999.999.999.995 | 3 |
| 200901 | 999.999.999.994 | 2 |
| 200902 | 999.999.999.998 | 8 |
| 200902 | 999.999.999.997 | 6 |
| 200902 | 999.999.999.996 | 5 |
| 200902 | 999.999.999.995 | 4 |
| 200902 | 999.999.999.994 | 3 |
| 200902 | 999.999.999.993 | 2 |
| 200903 | 999.999.999.997 | 8 |
| 200903 | 999.999.999.996 | 6 |
| 200903 | 999.999.999.995 | 5 |
| 200903 | 999.999.999.994 | 4 |
| 200903 | 999.999.999.993 | 3 |
| 200903 | 999.999.999.992 | 2 |
+--------+-----------------+------+
现在显示每月前三个IP地址:
Now show the top three IP addresses per month:
SELECT m1.month, m1.ip_address, m1.hits
FROM my_data_per_month m1
LEFT OUTER JOIN my_data_per_month m2
ON (m1.month = m2.month AND m1.hits < m2.hits)
GROUP BY m1.month, m1.ip_address
HAVING COUNT(*) < 3
ORDER BY m1.month ASC, m1.hits DESC;
+--------+-----------------+------+
| month | ip_address | hits |
+--------+-----------------+------+
| 200901 | 999.999.999.999 | 8 |
| 200901 | 999.999.999.998 | 6 |
| 200901 | 999.999.999.997 | 5 |
| 200902 | 999.999.999.998 | 8 |
| 200902 | 999.999.999.997 | 6 |
| 200902 | 999.999.999.996 | 5 |
| 200903 | 999.999.999.997 | 8 |
| 200903 | 999.999.999.996 | 6 |
| 200903 | 999.999.999.995 | 5 |
+--------+-----------------+------+
这篇关于按MySQL中的前N个分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!