使用COUNT和GROUP BY在两列上执行非常慢的SQL查询 [英] Terribly slow SQL query with COUNT and GROUP BY on two columns
问题描述
我要封存这个网路论坛,通常每周清除一次。所以我的屏幕刮了它,并将其存储到我的数据库(PostgreSQL)。
I'm archiving this web forum, which normally gets purged about once a week. So I'm screen scraping it, and storing it into my database (PostgreSQL).
我也做一点数据分析,
I also do a little analysis on the data, with some graphs for users to enjoy, like what time of day is the forum most active, and so forth.
所以我有一个posts表,像这样:
So I have a posts table, like so:
Column | Type
------------+------------------------------
id | integer
body | text
created_at | timestamp without time zone
topic_id | integer
user_name | text
user_id | integer
现在我想为每个用户设置一个帖子数,对于我的小十大海报表。
And I now want to have a post count for each user, for my little top 10 posters table.
我想出了这一点:
SELECT user_id, user_name, count(*)
FROM posts
GROUP BY user_id, user_name
ORDER BY count DESC LIMIT 10
原来是很慢。 9秒,目前帖子表中只有大约300 000行。
Which turns out to be very slow. 9 seconds, with just about 300 000 rows in the posts table at the moment.
只需要半秒,如果我只对一列进行分组,但我需要两个。
It takes only half a second, if I group on just one column, but I need both.
我对关系数据库和SQL很新,所以我不确定这是否正确,或者我怎么做错了?
I'm rather new to relational databases, and SQL, so I'm not quite sure if this is right, or just how am I doing it wrong?
推荐答案
可能只有一个具有特定ID的用户,因此 max(user_name)
应等于 user_name
。然后你可以在单个列上分组,你的帖子表示工作速度更快:
There's probably only one user with a particular ID, so max(user_name)
should equal user_name
. Then you can group on a single column, which your post indicates works faster:
SELECT user_id, max(user_name), count(*)
FROM posts
GROUP BY user_id
这篇关于使用COUNT和GROUP BY在两列上执行非常慢的SQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!