需要有关SQL的帮助以对搜索结果进行排名 [英] Need help with SQL for ranking search results
问题描述
我正在尝试使用mysql构建一个小型的运动搜索引擎.
每个练习可以具有任意数量的搜索标签.
这是我的数据结构:
TABLE exercises
ID
title
TABLE searchtags
ID
title
TABLE exerciseSearchtags
exerciseID -> exercises.ID
searchtagID -> searchtags.ID
...其中,exerciseSearchtags是一个多对多联接表,用于表达运动与searchtags之间的关系.
搜索引擎接受未知数量的用户输入的关键字.
我想根据关键字/搜索标签匹配项的数量对搜索结果进行排名.
这是我当前用于选择练习的sql. CASE规则和WHERE规则都是动态生成的,每个关键字一个.因此,例如,如果用户输入3个关键字,则将有3个CASE规则和3个WHERE规则.
SELECT
exercises.ID AS ID,
exercises.title AS title,
(
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
...etc...
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)
) AS relevance
FROM
exercises
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.exerciseID = exercises.ID
LEFT JOIN searchtags
ON searchtags.ID = exerciseSearchtags.searchtagID
WHERE
searchtags.title LIKE CONCAT('%',?,'%') OR
searchtags.title LIKE CONCAT('%',?,'%') OR
...etc...
searchtags.title LIKE CONCAT('%',?,'%')
GROUP BY
exercises.ID
ORDER BY
relevance DESC
该几乎有效.但是结果没有按我期望的顺序排列.
关于为什么会发生这种情况,我最好的猜测是,在对这些行按exercise.ID分组之前,已计算出相关性分数.因此,如果左联接使某项练习在结果集中出现10次,而另一项练习在结果集中出现4次,则第一个练习可能会获得更高的相关性得分,即使它可能没有更多的关键字/搜索标签匹配项. /p>
有人对我如何防止这种情况发生/解决此问题有任何建议/建议吗?
(预先)感谢您的帮助.
我找到了上述问题的可行解决方案,并将其发布在此处,以防其他人遇到类似问题.
解决方案是使用子选择而不是case语句.这是上面摘录的代码,已更正. (我不知道这是最好还是最有效的解决方案,但是它暂时为我解决了麻烦,并且似乎很快就返回了搜索结果.)
SELECT
exercises.ID AS ID,
exercises.title AS title,
(
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)+
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)+
...etc...
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)
) AS relevance
FROM
exercises
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.exerciseID = exercises.ID
LEFT JOIN searchtags
ON searchtags.ID = exerciseSearchtags.searchtagID
WHERE
searchtags.title LIKE CONCAT('%',?,'%') OR
searchtags.title LIKE CONCAT('%',?,'%') OR
...etc...
searchtags.title LIKE CONCAT('%',?,'%')
GROUP BY
exercises.ID
ORDER BY
relevance DESC
I am trying to build a tiny exercise search engine using mysql.
Each exercise can have an arbitrary number of search tags.
Here is my data structure:
TABLE exercises
ID
title
TABLE searchtags
ID
title
TABLE exerciseSearchtags
exerciseID -> exercises.ID
searchtagID -> searchtags.ID
...where exerciseSearchtags is a many to many join table expressing the relationship between exercises and searchtags.
The search engine accepts an unknown number of user inputted keywords.
I would like to rank search results based on the number of keyword / searchtag matches.
Here is the sql I am currently using to select for exercises. Both the CASE rules and the WHERE rules are dynamically generated, one for each keyword. So for example, if a user enters 3 keywords, there will be 3 CASE rules and 3 WHERE rules.
SELECT
exercises.ID AS ID,
exercises.title AS title,
(
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
...etc...
(CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)
) AS relevance
FROM
exercises
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.exerciseID = exercises.ID
LEFT JOIN searchtags
ON searchtags.ID = exerciseSearchtags.searchtagID
WHERE
searchtags.title LIKE CONCAT('%',?,'%') OR
searchtags.title LIKE CONCAT('%',?,'%') OR
...etc...
searchtags.title LIKE CONCAT('%',?,'%')
GROUP BY
exercises.ID
ORDER BY
relevance DESC
This almost works. However the results are not being ranked in the order I would expect.
My best guess as to why this is happening, is that the relevence score is being calculated BEFORE the rows are grouped by exercise.ID. So if the left join causes a particular exercise to appear 10 times in the result set, and another exercise to appear 4 times, then the first exercise may get a higher relevence score, even though it may not have more keyword / searchtag matches.
Does anyone have any suggestions / advice on how I can prevent this from happening / fix this?
Thanks (in advance) for your help.
I have found a working solution to the above problem, and am posting it here, in case anyone else experiences a similar problem.
The solution is to use a sub-select, instead of a case statement. Here is the above divet of code, corrected. (I do not know if this is the best or most efficient solution, but it has fixed the trouble for me, time being, and seems to return search results reasonably quickly.)
SELECT
exercises.ID AS ID,
exercises.title AS title,
(
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)+
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)+
...etc...
(
SELECT COUNT(1)
FROM searchtags
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.searchtagID = searchtags.ID
WHERE searchtags.title LIKE CONCAT('%',?,'%')
AND exerciseSearchtags.exerciseID = exercises.ID
)
) AS relevance
FROM
exercises
LEFT JOIN exerciseSearchtags
ON exerciseSearchtags.exerciseID = exercises.ID
LEFT JOIN searchtags
ON searchtags.ID = exerciseSearchtags.searchtagID
WHERE
searchtags.title LIKE CONCAT('%',?,'%') OR
searchtags.title LIKE CONCAT('%',?,'%') OR
...etc...
searchtags.title LIKE CONCAT('%',?,'%')
GROUP BY
exercises.ID
ORDER BY
relevance DESC
这篇关于需要有关SQL的帮助以对搜索结果进行排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!