如何使用mysql找到最相关的行? [英] How to find the most relevant rows using mysql?

查看:38
本文介绍了如何使用mysql找到最相关的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含数千行的文章的 MySQL 表.每个文章行最多可以有 3 个 VARCHAR 标签字段:tag1tag2tag3,每个字段都可以为空(用 NA 在数据库中)

I have a MySQL table of articles with several thousands of rows. Each article row can have up to 3 VARCHAR tag fields: tag1, tag2, tag3, each of which can be empty (filled with NA in the database)

article 表架构是这样的:

+------------+--------------+------+-----+-------------------+----------------+
| Field      | Type         | Null | Key | Default           | Extra          |
+------------+--------------+------+-----+-------------------+----------------+
| id         | int(11)      | NO   | PRI | NULL              | auto_increment |
| title      | varchar(100) | NO   |     | NULL              |                |
| body       | longtext     | NO   |     | NULL              |                |
| created_at | datetime     | NO   | MUL | CURRENT_TIMESTAMP |                           |
| visits     | int(11)      | NO   |     | 1                 |                |
| slug       | varchar(100) | NO   |     | NULL              |                |
| lasthit    | datetime     | YES  |     | CURRENT_TIMESTAMP |                |
| tag1       | varchar(100) | NO   |     | NA                |                |
| tag2       | varchar(100) | NO   |     | NA                |                |
| tag3       | varchar(100) | NO   |     | NA                |                |
+------------+--------------+------+-----+-------------------+----------------+

我想为每篇文章找到一个包含 10 篇相关文章 的列表,因此与该文章具有更多相似标签的行应该排在第一位.例如,如果 ArticleA 有:

I'd like to find a list of 10 Relevant Articles for each article, so the rows which have higher number similar tags with that article should come first. So for example, if ArticleA has:

tag1 = "cat"
tag2 = "dog"
tag3 = "fish"

查询应该返回10篇具有相同标签的文章,如果只有4行有这3个标签,剩下的结果必须来自有2个公共标签的行,如果没有足够的行有2个匹配标签,应返回 1 个匹配的标签.

The query should return 10 articles which have the very same tags, and if there are only 4 row having these 3 tags,the remaining results must be from rows have 2 common tags, and if there are not enough rows with 2 matching tags, 1 matching tag should be returned.

我想知道实现这一目标的最佳方法是什么?

I'm wondering what is the best way to achieve this?

推荐答案

所以 newA.* 是包含一些匹配的 10 篇文章的列表.取决于为文章 A 输入的前一篇文章 ID.

So newA.* is a list of 10 articles with some match. The depends on the input of a previous articleID for articleA.

SELECT newA.*
FROM articles articleA
JOIN articles newA
  ON newA.tag1 IN (articleA.tag1, articleA.tag2, articleA.tag3) OR
     newA.tag2 IN (articleA.tag1, articleA.tag2, articleA.tag3) OR
     newA.tag3 IN (articleA.tag1, articleA.tag2, articleA.tag3)
WHERE articleA.id = {some explict number}
   AND newA.id != articleA.id
ORDER BY
  (newA.tag1 IN (articleA.tag1, articleA.tag2, articleA.tag3) IS NOT NULL) +
  (newA.tag2 IN (articleA.tag1, articleA.tag2, articleA.tag3) IS NOT NULL) +
  (newA.tag3 IN (articleA.tag1, articleA.tag2, articleA.tag3) IS NOT NULL)
  DESC
LIMIT 10

ORDER BY 接受每个标签,X IN (...) 是:

The ORDER BY takes each tag, and X IN (...) is either:

  • 1 - 真,值存在
  • 0 - 错误,值不存在
  • NULL - newA.tagX 为空.

'IS NOT NULL' 将 NULL 变为 0,因此可以添加.

The 'IS NOT NULL' turns NULL into 0, so it can be added.

这个查询效率很低,因为它需要扫描整篇文章来确定匹配.

This query is rather inefficient as it requires to scan the entire articles to determine a match.

为了更高效,article_tags 表将 id 映射到具有 id,tagtag> 主键将导致此查询:

To be more efficient a article_tags table that maps the id to a tag that has a id,tag primary key would result in this query:

SELECT newA.*
FROM articles articleA
  JOIN article_tags tagsA ON articleA.id=tagsA.id
  JOIN article_tags newAtags ON tagsA.tag=newA.tag
  JOIN articles newA ON newAtags.id=newA.id
WHERE articleA.id = {some explicit number}
   AND newA.id != articleA.id
GROUP BY newA.id
ORDER BY COUNT(*) DESC
LIMIT 10

这篇关于如何使用mysql找到最相关的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆