比较标签组以找到与PHP/MySQL的相似性/分数 [英] Compare group of tags to find similarity/score with PHP/MySQL

查看:93
本文介绍了比较标签组以找到与PHP/MySQL的相似性/分数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将一组标签与数据库中另一个帖子的标签进行比较以获取相关的帖子?

我想做的是将帖子中的一组标签与另一篇帖子的标签进行比较,但不是每个标签都单独进行比较.假设您想根据帖子中的标签获取真正相关的项目,然后从相关性最高到相关性最低的项目中进行显示.不论关系级别如何,每次都必须显示三个相关项目.

What I'm trying to do is compare a group of tags on a post to another post's tags, but not each tag individually. So say you wanted to get truly related items based on tags from a post and then show them from the most related to the least related. Each time there have to be three related items shown, no matter the relationship level.

帖子A具有以下标签:建筑",木材",现代",瑞士"
帖子B具有以下标签:建筑",木材",现代"
帖子C具有以下标签:体系结构",现代",石头"
帖子D的标签为:体系结构",房屋",住所"

Post A has the tags: "architecture", "wood", "modern", "switzerland"
Post B has the tags: "architecture", "wood", "modern"
Post C has the tags: "architecture", "modern", "stone"
Post D has the tags: "architecture", "house", "residence"

帖子B与帖子A的关联度为75%(3个相关标签)
帖子C与帖子A的关联度为50%(2个相关标签)
帖子D与帖子A的关联度为25%(1个相关标签)

Post B is related to post A by 75% (3 related tags)
Post C is related to post A by 50% (2 related tags)
Post D is related to post A by 25% (1 related tag)

我该怎么做?我目前正在使用 3张桌子.

How can I do that? I'm currently using a 3-tables.

posts
> id
> image
> date

post_tags
> post_id
> tag_id

tags
> id
> name

我已经搜索了Internet和Stack Overflow,以了解如何执行此操作.我最近的发现是如何找到相关项目",在PHP中,但实际上对我来说解决不了什么.

I have searched the Internet and Stack Overflow to find out how to do this. My closest find was How to find "related items" in PHP, but it actually didn't solve much for me.

推荐答案

注意:该解决方案仅适用于MySQL,因为MySQL对GROUP BY具有自己的解释

NOTE: This solution is MySQL only, as MySQL has its own interpretation of GROUP BY

我也使用了自己的相似度计算方法.我将相同标签的数量除以帖子A和帖子B中的平均标签数量.因此,如果帖子A有4个标签,而帖子B有2个标签都与A共享,则相似度为66%

I've also used my own calculation of similarity. I've taken the number of identical tags and divided it by the average tag count in post A and post B. So if post A has 4 tags, and post B has 2 tags which are both shared with A, the similarity is 66%.

(SHARED:2 / ((A:4 + B:2)/2)(SHARED:2) / (AVG:3)

如果想要/需要...,更改公式应该很容易.

It should be easy to change the formula if you want/need to...

SELECT
 sourcePost.id,
 targetPost.id,

 /* COUNT NUMBER OF IDENTICAL TAGS */
 /* REF GROUPING OF sourcePost.id and targetPost.id BELOW */
 COUNT(targetPost.id) /
 (
  (
   /* TOTAL TAGS IN SOURCE POST */
   (SELECT COUNT(*) FROM post_tags WHERE post_id = sourcePost.id)

   +

   /* TOTAL TAGS IN TARGET POST */
   (SELECT COUNT(*) FROM post_tags WHERE post_id = targetPost.id)

  ) / 2  /* AVERAGE TAGS IN SOURCE + TARGET */
 ) as similarity
FROM
 posts sourcePost
LEFT JOIN
 post_tags sourcePostTags ON (sourcePost.id = sourcePostTags.post_id)
INNER JOIN
 post_tags targetPostTags ON (sourcePostTags.tag_id = targetPostTags.tag_id
                             AND 
                              sourcePostTags.post_id != targetPostTags.post_id)
LEFT JOIN
 posts targetPost ON (targetPostTags.post_id = targetPost.id)
GROUP BY
 sourcePost.id, targetPost.id

这篇关于比较标签组以找到与PHP/MySQL的相似性/分数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆