需要有关SQL的帮助以对搜索结果进行排名 [英] Need help with SQL for ranking search results

查看:66
本文介绍了需要有关SQL的帮助以对搜索结果进行排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用mysql构建一个小型的运动搜索引擎.

每个练习可以具有任意数量的搜索标签.

这是我的数据结构:

TABLE exercises
  ID
  title

TABLE searchtags
  ID
  title

TABLE exerciseSearchtags
  exerciseID -> exercises.ID
  searchtagID -> searchtags.ID

...其中,exerciseSearchtags是一个多对多联接表,用于表达运动与searchtags之间的关系.

搜索引擎接受未知数量的用户输入的关键字.

我想根据关键字/搜索标签匹配项的数量对搜索结果进行排名.

这是我当前用于选择练习的sql. CASE规则和WHERE规则都是动态生成的,每个关键字一个.因此,例如,如果用户输入3个关键字,则将有3个CASE规则和3个WHERE规则.

    SELECT 
        exercises.ID AS ID,
        exercises.title AS title, 
        (
            (CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
            (CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
            ...etc...
            (CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)
        ) AS relevance

    FROM 
        exercises

    LEFT JOIN exerciseSearchtags
        ON exerciseSearchtags.exerciseID = exercises.ID 

    LEFT JOIN searchtags
        ON searchtags.ID = exerciseSearchtags.searchtagID

    WHERE
        searchtags.title LIKE CONCAT('%',?,'%') OR
        searchtags.title LIKE CONCAT('%',?,'%') OR
        ...etc...
        searchtags.title LIKE CONCAT('%',?,'%') 

    GROUP BY 
        exercises.ID                

    ORDER BY 
        relevance DESC

几乎有效.但是结果没有按我期望的顺序排列.

关于为什么会发生这种情况,我最好的猜测是,在对这些行按exercise.ID分组之前,已计算出相关性分数.因此,如果左联接使某项练习在结果集中出现10次,而另一项练习在结果集中出现4次,则第一个练习可能会获得更高的相关性得分,即使它可能没有更多的关键字/搜索标签匹配项. /p>

有人对我如何防止这种情况发生/解决此问题有任何建议/建议吗?

(预先)感谢您的帮助.

解决方案

我找到了上述问题的可行解决方案,并将其发布在此处,以防其他人遇到类似问题.

解决方案是使用子选择而不是case语句.这是上面摘录的代码,已更正. (我不知道这是最好还是最有效的解决方案,但是它暂时为我解决了麻烦,并且似乎很快就返回了搜索结果.)

SELECT 
    exercises.ID AS ID,
    exercises.title AS title, 
    (
        (
            SELECT COUNT(1) 
            FROM searchtags 
            LEFT JOIN exerciseSearchtags 
            ON exerciseSearchtags.searchtagID = searchtags.ID 
            WHERE searchtags.title LIKE CONCAT('%',?,'%') 
            AND exerciseSearchtags.exerciseID = exercises.ID
        )+
        (
            SELECT COUNT(1) 
            FROM searchtags 
            LEFT JOIN exerciseSearchtags 
            ON exerciseSearchtags.searchtagID = searchtags.ID 
            WHERE searchtags.title LIKE CONCAT('%',?,'%') 
            AND exerciseSearchtags.exerciseID = exercises.ID
        )+
        ...etc...
        (
            SELECT COUNT(1) 
            FROM searchtags 
            LEFT JOIN exerciseSearchtags 
            ON exerciseSearchtags.searchtagID = searchtags.ID 
            WHERE searchtags.title LIKE CONCAT('%',?,'%') 
            AND exerciseSearchtags.exerciseID = exercises.ID
        )
    ) AS relevance

FROM 
    exercises

LEFT JOIN exerciseSearchtags
    ON exerciseSearchtags.exerciseID = exercises.ID 

LEFT JOIN searchtags
    ON searchtags.ID = exerciseSearchtags.searchtagID

WHERE
    searchtags.title LIKE CONCAT('%',?,'%') OR
    searchtags.title LIKE CONCAT('%',?,'%') OR
    ...etc...
    searchtags.title LIKE CONCAT('%',?,'%') 

GROUP BY 
    exercises.ID                

ORDER BY 
    relevance DESC

I am trying to build a tiny exercise search engine using mysql.

Each exercise can have an arbitrary number of search tags.

Here is my data structure:

TABLE exercises
  ID
  title

TABLE searchtags
  ID
  title

TABLE exerciseSearchtags
  exerciseID -> exercises.ID
  searchtagID -> searchtags.ID

...where exerciseSearchtags is a many to many join table expressing the relationship between exercises and searchtags.

The search engine accepts an unknown number of user inputted keywords.

I would like to rank search results based on the number of keyword / searchtag matches.

Here is the sql I am currently using to select for exercises. Both the CASE rules and the WHERE rules are dynamically generated, one for each keyword. So for example, if a user enters 3 keywords, there will be 3 CASE rules and 3 WHERE rules.

    SELECT 
        exercises.ID AS ID,
        exercises.title AS title, 
        (
            (CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
            (CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)+
            ...etc...
            (CASE WHEN searchtags.title LIKE CONCAT('%',?,'%') THEN 1 ELSE 0 END)
        ) AS relevance

    FROM 
        exercises

    LEFT JOIN exerciseSearchtags
        ON exerciseSearchtags.exerciseID = exercises.ID 

    LEFT JOIN searchtags
        ON searchtags.ID = exerciseSearchtags.searchtagID

    WHERE
        searchtags.title LIKE CONCAT('%',?,'%') OR
        searchtags.title LIKE CONCAT('%',?,'%') OR
        ...etc...
        searchtags.title LIKE CONCAT('%',?,'%') 

    GROUP BY 
        exercises.ID                

    ORDER BY 
        relevance DESC

This almost works. However the results are not being ranked in the order I would expect.

My best guess as to why this is happening, is that the relevence score is being calculated BEFORE the rows are grouped by exercise.ID. So if the left join causes a particular exercise to appear 10 times in the result set, and another exercise to appear 4 times, then the first exercise may get a higher relevence score, even though it may not have more keyword / searchtag matches.

Does anyone have any suggestions / advice on how I can prevent this from happening / fix this?

Thanks (in advance) for your help.

解决方案

I have found a working solution to the above problem, and am posting it here, in case anyone else experiences a similar problem.

The solution is to use a sub-select, instead of a case statement. Here is the above divet of code, corrected. (I do not know if this is the best or most efficient solution, but it has fixed the trouble for me, time being, and seems to return search results reasonably quickly.)

SELECT 
    exercises.ID AS ID,
    exercises.title AS title, 
    (
        (
            SELECT COUNT(1) 
            FROM searchtags 
            LEFT JOIN exerciseSearchtags 
            ON exerciseSearchtags.searchtagID = searchtags.ID 
            WHERE searchtags.title LIKE CONCAT('%',?,'%') 
            AND exerciseSearchtags.exerciseID = exercises.ID
        )+
        (
            SELECT COUNT(1) 
            FROM searchtags 
            LEFT JOIN exerciseSearchtags 
            ON exerciseSearchtags.searchtagID = searchtags.ID 
            WHERE searchtags.title LIKE CONCAT('%',?,'%') 
            AND exerciseSearchtags.exerciseID = exercises.ID
        )+
        ...etc...
        (
            SELECT COUNT(1) 
            FROM searchtags 
            LEFT JOIN exerciseSearchtags 
            ON exerciseSearchtags.searchtagID = searchtags.ID 
            WHERE searchtags.title LIKE CONCAT('%',?,'%') 
            AND exerciseSearchtags.exerciseID = exercises.ID
        )
    ) AS relevance

FROM 
    exercises

LEFT JOIN exerciseSearchtags
    ON exerciseSearchtags.exerciseID = exercises.ID 

LEFT JOIN searchtags
    ON searchtags.ID = exerciseSearchtags.searchtagID

WHERE
    searchtags.title LIKE CONCAT('%',?,'%') OR
    searchtags.title LIKE CONCAT('%',?,'%') OR
    ...etc...
    searchtags.title LIKE CONCAT('%',?,'%') 

GROUP BY 
    exercises.ID                

ORDER BY 
    relevance DESC

这篇关于需要有关SQL的帮助以对搜索结果进行排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆