比较标签组以查找与 PHP/MySQL 的相似性/分数 [英] Compare group of tags to find similarity/score with PHP/MySQL

查看:42
本文介绍了比较标签组以查找与 PHP/MySQL 的相似性/分数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将一组标签与我数据库中另一个帖子的标签进行比较以获取相关帖子?

我要做的是将帖子上的一组标签与另一个帖子的标签进行比较,而不是单独比较每个标签.因此,假设您想根据帖子中的标签获取真正相关的项目,然后从最相关到​​最不相关显示它们.无论关系级别如何,每次都必须显示三个相关项目.

What I'm trying to do is compare a group of tags on a post to another post's tags, but not each tag individually. So say you wanted to get truly related items based on tags from a post and then show them from the most related to the least related. Each time there have to be three related items shown, no matter the relationship level.

帖子 A 的标签有:建筑"、木材"、现代"、瑞士"
帖子 B 的标签有:建筑"、木材"、现代"
Post C 有标签:建筑"、现代"、石头"
帖子 D 有标签:建筑"、房子"、住宅"

Post A has the tags: "architecture", "wood", "modern", "switzerland"
Post B has the tags: "architecture", "wood", "modern"
Post C has the tags: "architecture", "modern", "stone"
Post D has the tags: "architecture", "house", "residence"

帖子 B 与帖子 A 的相关度为 75%(3 个相关标签)
帖子 C 与帖子 A 的相关度为 50%(2 个相关标签)
帖子 D 与帖子 A 的相关度为 25%(1 个相关标签)

Post B is related to post A by 75% (3 related tags)
Post C is related to post A by 50% (2 related tags)
Post D is related to post A by 25% (1 related tag)

我该怎么做?我目前正在使用 3-tables.

How can I do that? I'm currently using a 3-tables.

posts
> id
> image
> date

post_tags
> post_id
> tag_id

tags
> id
> name

我已经搜索了 Internet 和 Stack Overflow 以了解如何执行此操作.我最接近的发现是 如何查找相关项目"在 PHP 中,但它实际上并没有为我解决太多问题.

I have searched the Internet and Stack Overflow to find out how to do this. My closest find was How to find "related items" in PHP, but it actually didn't solve much for me.

推荐答案

注意:此解决方案仅适用于 MySQL,因为 MySQL 对 GROUP BY 有自己的解释

NOTE: This solution is MySQL only, as MySQL has its own interpretation of GROUP BY

我还使用了我自己的相似度计算方法.我将相同标签的数量除以帖子 A 和帖子 B 中的平均标签数.因此,如果帖子 A 有 4 个标签,而帖子 B 有 2 个与 A 共享的标签,则相似度为 66%.

I've also used my own calculation of similarity. I've taken the number of identical tags and divided it by the average tag count in post A and post B. So if post A has 4 tags, and post B has 2 tags which are both shared with A, the similarity is 66%.

(SHARED:2/((A:4 + B:2)/2)(SHARED:2)/(AVG:3)

如果您想要/需要更改公式应该很容易......

It should be easy to change the formula if you want/need to...

SELECT
 sourcePost.id,
 targetPost.id,

 /* COUNT NUMBER OF IDENTICAL TAGS */
 /* REF GROUPING OF sourcePost.id and targetPost.id BELOW */
 COUNT(targetPost.id) /
 (
  (
   /* TOTAL TAGS IN SOURCE POST */
   (SELECT COUNT(*) FROM post_tags WHERE post_id = sourcePost.id)

   +

   /* TOTAL TAGS IN TARGET POST */
   (SELECT COUNT(*) FROM post_tags WHERE post_id = targetPost.id)

  ) / 2  /* AVERAGE TAGS IN SOURCE + TARGET */
 ) as similarity
FROM
 posts sourcePost
LEFT JOIN
 post_tags sourcePostTags ON (sourcePost.id = sourcePostTags.post_id)
INNER JOIN
 post_tags targetPostTags ON (sourcePostTags.tag_id = targetPostTags.tag_id
                             AND 
                              sourcePostTags.post_id != targetPostTags.post_id)
LEFT JOIN
 posts targetPost ON (targetPostTags.post_id = targetPost.id)
GROUP BY
 sourcePost.id, targetPost.id

这篇关于比较标签组以查找与 PHP/MySQL 的相似性/分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆