每个关键字的前'n'个搜索结果 [英] Top 'n' results for each keyword

查看:82
本文介绍了每个关键字的前'n'个搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个查询来查询对特定关键字进行评论的前n位用户

I have a query to get the top 'n' users who commented on a specific keyword,

SELECT `user` , COUNT( * ) AS magnitude
FROM `results`
WHERE `keyword` = "economy"
GROUP BY `user`
ORDER BY magnitude DESC
LIMIT 5 

我大约有6000个关键字,并且希望运行此查询,以使我获得我们拥有数据的每个关键字的前'n'个用户.协助表示赞赏.

I have approx 6000 keywords, and would like to run this query to get me the top 'n' users for each and every keyword we have data for. Assistance appreciated.

推荐答案

由于您没有提供results的架构,因此我假设它是这个或非常相似的(也许是额外的列):

Since you haven't given the schema for results, I'll assume it's this or very similar (maybe extra columns):

create table results (
  id int primary key,
  user int,
    foreign key (user) references <some_other_table>(id),
  keyword varchar(<30>)
);

第1步:与示例查询一样,按keyword/user进行汇总,但适用于所有关键字:

Step 1: aggregate by keyword/user as in your example query, but for all keywords:

create view user_keyword as (
  select
    keyword,
    user,
    count(*) as magnitude
  from results
  group by keyword, user
);

步骤2:对每个关键字组中的每个用户进行排名(请注意使用子查询对行进行排名):

Step 2: rank each user within each keyword group (note the use of the subquery to rank the rows):

create view keyword_user_ranked as (
  select 
    keyword,
    user,
    magnitude,
    (select count(*) 
     from user_keyword 
     where l.keyword = keyword and magnitude >= l.magnitude
    ) as rank
  from
    user_keyword l
);

第3步::仅选择排名小于某个数字的行:

Step 3: select only the rows where the rank is less than some number:

select * 
from keyword_user_ranked 
where rank <= 3;


示例:


Example:

使用的基础数据:

mysql> select * from results;
+----+------+---------+
| id | user | keyword |
+----+------+---------+
|  1 |    1 | mysql   |
|  2 |    1 | mysql   |
|  3 |    2 | mysql   |
|  4 |    1 | query   |
|  5 |    2 | query   |
|  6 |    2 | query   |
|  7 |    2 | query   |
|  8 |    1 | table   |
|  9 |    2 | table   |
| 10 |    1 | table   |
| 11 |    3 | table   |
| 12 |    3 | mysql   |
| 13 |    3 | query   |
| 14 |    2 | mysql   |
| 15 |    1 | mysql   |
| 16 |    1 | mysql   |
| 17 |    3 | query   |
| 18 |    4 | mysql   |
| 19 |    4 | mysql   |
| 20 |    5 | mysql   |
+----+------+---------+

按关键字和用户分组:

mysql> select * from user_keyword order by keyword, magnitude desc;
+---------+------+-----------+
| keyword | user | magnitude |
+---------+------+-----------+
| mysql   |    1 |         4 |
| mysql   |    2 |         2 |
| mysql   |    4 |         2 |
| mysql   |    3 |         1 |
| mysql   |    5 |         1 |
| query   |    2 |         3 |
| query   |    3 |         2 |
| query   |    1 |         1 |
| table   |    1 |         2 |
| table   |    2 |         1 |
| table   |    3 |         1 |
+---------+------+-----------+

在关键字中排名的用户:

Users ranked within keywords:

mysql> select * from keyword_user_ranked order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql   |    1 |         4 |    1 |
| mysql   |    2 |         2 |    3 |
| mysql   |    4 |         2 |    3 |
| mysql   |    3 |         1 |    5 |
| mysql   |    5 |         1 |    5 |
| query   |    2 |         3 |    1 |
| query   |    3 |         2 |    2 |
| query   |    1 |         1 |    3 |
| table   |    1 |         2 |    1 |
| table   |    3 |         1 |    3 |
| table   |    2 |         1 |    3 |
+---------+------+-----------+------+

每个关键字仅排名前2:

Only top 2 from each keyword:

mysql> select * from keyword_user_ranked where rank <= 2 order by keyword, rank asc;
+---------+------+-----------+------+
| keyword | user | magnitude | rank |
+---------+------+-----------+------+
| mysql   |    1 |         4 |    1 |
| query   |    2 |         3 |    1 |
| query   |    3 |         2 |    2 |
| table   |    1 |         2 |    1 |
+---------+------+-----------+------+


请注意,如果出现并列关系,请在示例中查看用户2和4的关键字"mysql",并列关系的所有各方都将获得最后一位"的排名,即,如果第2位和第3位并列,则均被分配排名3.


Note that when there are ties -- see users 2 and 4 for keyword "mysql" in the examples -- all parties in the tie get the "last" rank, i.e. if the 2nd and 3rd are tied, both are assigned rank 3.

效果:在关键字和用户列中添加索引会有所帮助.我有一个查询表的方式类似,两列的值分别为4000和1300(在600000行表中).您可以像这样添加索引:

Performance: adding an index to the keyword and user columns will help. I have a table being queried in a similar way with 4000 and 1300 distinct values for the two columns (in a 600000-row table). You can add the index like this:

alter table results add index keyword_user (keyword, user);

就我而言,查询时间从大约6秒减少到大约2秒.

In my case, query time dropped from about 6 seconds to about 2 seconds.

这篇关于每个关键字的前'n'个搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆