MySQL查找相关文章 [英] MySQL Find Related Articles

查看：56 发布时间：2020/5/14 22:32:42 sql mysql

本文介绍了MySQL查找相关文章的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正尝试选择最多10条相关文章，其中相关文章是与其他文章具有3个或更多相同关键字的文章.

I'm trying to select a maximum of 10 related articles, where a related article is an article that has 3 or more of the same keywords as the other article.

我的表结构如下:

articles[id, title, content, time]
tags[id, tag]
articles_tags[article_id, tag_id]

我可以在一个查询中选择所有相关的文章ID和标题吗?

Can I select the related articles id and title all in one query?

非常感谢您的帮助.

在何处排除种子"文章

因为我并不在乎我匹配哪个标签，而只是在3个标签上匹配，所以我只需要tag_id并完全避免连接到标签表.因此，现在我将多对多表与其自身相连，以查找重叠的文章.

Where to exclude the 'seed' article

Because I don't care exactly WHICH tags I match on, just THAT I match on 3 tags, I only need tag_id and avoid the join to the tags table completely. So now I join the many-to-many table to itself to find the articles which have an overlap.

问题在于该文章将100％匹配到自己，因此我们需要从结果中消除它.

The problem is that the article will match itself 100% so we need to eliminate that from the results.

您可以通过3种方式排除该记录.您可以将其从表中筛选到联接之前，可以使其脱离联接，也可以在完成后对其进行筛选.

You can exclude that record in 3 ways. You can filter it from the table to before joining, you can have it fall out of the join, or you can filter it when you're finished.

如果在开始加入之前将其消除，则不会获得太多优势.您有成千上万的文章，而您只消除了1.我还认为，基于article_tag映射表的最佳索引，这将无用.

If you eliminate it before you begin the join, you're not gaining much of an advantage. You've got thousands or millions of articles and you're only eliminating 1. I also believe this will not be useful based on the best index for the article_tag mapping table.

如果将其作为连接的一部分进行，则不等式将阻止该子句成为索引扫描的一部分，并在索引扫描之后用作过滤器.

If you do it as part of the join, the inequality will prevent that clause from being part of the index scan and be applied as a filter after the index scan.

将article_tags上的索引视为(Tag_ID，Article_ID).如果我在tag_id = tag_id上将索引与其自身相连，那么我将通过将索引移至我的种子"文章所具有的每个tag_id来立即定义要处理的索引切片.如果添加子句article_id！= article_id，则无法使用索引来定义要处理的切片.这意味着它将被用作过滤器.例如假设我的第一个标签是蓝色".我浏览索引以获取所有具有蓝色"的文章. (当然是通过ID).假设有50行.我们知道1是我的种子文章，49是匹配项.如果我不包括不等式，那么我将包括所有50条记录并继续前进.如果确实包含不等式，则必须检查50条记录中的每条记录，以查看哪些是我的种子，哪些不是我的种子.下一个标签是"Jupiter"，它匹配20,000个文章.再次，我必须检查索引的该切片中的每一行以排除我的种子文章.在经历了2,5,20次(取决于该种子文章的标签)之后，我现在有了一组完全干净的文章来进行COUNT(*)和HAVING处理.如果我不将不等式作为连接的一部分，而只是将SEED ID过滤掉，然后在分组之后将其过滤，那么我只会在很短的列表中执行一次该过滤器.

Consider the index on article_tags as (Tag_ID, Article_ID). If I join the index to itself on tag_id = tag_id then I'll immediately define the slice of the index to process by walking the index to each tag_id my 'seed' article has. If I add the clause article_id != article_id, that can't use the index to define the slice to be processed. That means it will be applied as a filter. e.g. Say my first tag is "BLUE". I walk the index to get all the articles which have "BLUE". (by ID of course). Say there are 50 rows. We know that 1 is my seed article and 49 are matches. If I don't include the inequality, I include all 50 records and move on. If I do include the inequality, I then have to check each of the 50 records to see which is my seed and which isn't. The next tag is "Jupiter" and it matches 20,000 articles. Again I have to check each row in that slice of the index to exclude my seed article. After I go through this 2,5,20 times (depends on tags for that seed article), I now have a completely clean set of articles to do the COUNT(*) and HAVING on. If I don't include the inequality as part of my join but instead just filter the SEED ID out after the group by and having then I only do that filer once on a very short list.

这篇关于MySQL查找相关文章的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

MySQL查找相关文章 [英] MySQL Find Related Articles

问题描述

推荐答案

在何处排除种子"文章

Where to exclude the 'seed' article

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

MySQL查找相关文章 [英] MySQL Find Related Articles

问题描述

推荐答案

在何处排除种子"文章

Where to exclude the 'seed' article

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭