MySQL/PHP:通过标签/分类法查找相似/相关的项目 [英] MySQL / PHP: Find similar / related items by tag / taxonomy

查看:78
本文介绍了MySQL/PHP:通过标签/分类法查找相似/相关的项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的城市表.

I have a cities table which looks like this.

|id| Name    |
|1 | Paris   |
|2 | London  |
|3 | New York|

我有一个看起来像这样的标签表.

I have a tags table which looks like this.

|id| tag            |
|1 | Europe         |
|2 | North America  |   
|3 | River          |

和city_tags表:

and a cities_tags table:

|id| city_id | tag_id |
|1 | 1       | 1      | 
|2 | 1       | 3      | 
|3 | 2       | 1      |
|4 | 2       | 3      | 
|5 | 3       | 2      |     
|6 | 3       | 3      |

我该如何计算最密切相关的城市?例如.如果我查看的是城市1(巴黎),则结果应为:伦敦(2),纽约(3)

How do I calculate which are the most closely related city? For example. If I were looking at city 1 (Paris), the results should be: London (2), New York (3)

我找到了 Jaccard索引,但是我不确定如何最好地实现这一点.

I have found the Jaccard index but I'm unsure as how best to implement this.

推荐答案

您对 的疑问如何计算最密切相关的城市?例如.如果我查看的是城市1(巴黎),则结果应为:伦敦(2),纽约(3) ,根据您提供的数据集,只有一件事与之相关:城市之间的公共标签,因此共享公共标签的城市将是最接近的城市.下面是子查询,该子查询可找到共享公共标签的城市(而不是提供城市以找到其最近的城市)

You question about How do I calculate which are the most closely related city? For example. If I were looking at city 1 (Paris), the results should be: London (2), New York (3) and based on your provided data set there is only one thing to relate that is the common tags between the cities so the cities which shares the common tags would be the closest one below is the subquery which finds the cities (other than which is provided to find its closest cities) that shares the common tags

SELECT * FROM `cities`  WHERE id IN (
SELECT city_id FROM `cities_tags` WHERE tag_id IN (
SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )

工作

我假设您要输入城市ID或名称之一,以查找与之最接近的城市名称(在我的情况下,巴黎"的ID为

Working

I assume you will input one of the city id or name to find their closest one in my case "Paris" has the id one

 SELECT tag_id FROM `cities_tags` WHERE city_id=1

它将找到巴黎拥有的所有标签id

It will find all the tags id which paris has then

SELECT city_id FROM `cities_tags` WHERE tag_id IN (
    SELECT tag_id FROM `cities_tags` WHERE city_id=1) AND city_id !=1 )

它将获取除巴黎以外具有与巴黎也具有相同标签的所有城市的所有城市

It will fetch all the cities except paris that has the some same tags that paris also has

这是您的> 小提琴

在阅读有关 Jaccard相似度/索引的内容时,我们发现了一些有关术语实际含义的信息,以本示例为例,我们有两组A和A. B

While reading about the Jaccard similarity/index found some stuff to understand about the what actualy the terms is lets take this example we have two sets A & B

设置A = {A,B,C,D,E}

设置B = {I,H,G,F,E,D}

用于计算jaccard相似度的公式为JS =(A与B相交)/(A 联盟B)

Formula to calculate the jaccard similarity is JS=(A intersect B)/(A union B)

A与B相交= {D,E} = 2

联合B = {A,B,C,D,E,I,H,G,F} = 9

JS = 2/9 = 0.2222222222222222

现在转到您的方案

巴黎有tag_ids 1,3,因此我们将其设为集合并称为集合 P = {欧洲,河流}

Paris has the tag_ids 1,3 so we make the set of this and call our Set P ={Europe,River}

伦敦有tag_ids 1,3,因此我们将其设为集合并称为 设置L = {欧洲,河流}

London has the tag_ids 1,3 so we make the set of this and call our Set L ={Europe,River}

纽约有tag_ids 2,3,因此我们将其设为集合并称为 设置NW = {北美,河}

New York has the tag_ids 2,3 so we make the set of this and call our Set NW ={North America,River}

使用伦敦JSPL = P相交L/P并合L来计算JS巴黎, JSPL = 2/2 = 1

Calculting the JS Paris with London JSPL = P intersect L / P union L , JSPL = 2/2 = 1

计算与纽约的JS巴黎JSPNW = P与NW/P相交 联盟NW,JSPNW = 1/3 = 0.3333333333

Calculting the JS Paris with New York JSPNW = P intersect NW / P union NW ,JSPNW = 1/3 = 0.3333333333

到目前为止,这是计算出完美jaccard索引的查询,您可以看到下面的小提琴示例

Here is the query so far which calcluates the perfect jaccard index you can see the below fiddle example

SELECT a.*, 
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index 
 FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` , 
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT  GROUP_CONCAT(tag_id SEPARATOR ',')  FROM `cities_tags` WHERE city_id= 1)AS parisset

FROM `cities_tags` 
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`)
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC 

在上面的查询中,我已将结果集导出到两个子选择中,以便获取我自定义的计算别名

In above query i have the i have derived the result set to two subselects in order get my custom calculated aliases

您可以在上面的查询中添加过滤条件,而不用计算与自身的相似度

You can add the filter in above query not to calculate the similarity with itself

SELECT a.*, 
( (CASE WHEN a.`intersect` =0 THEN a.`union` ELSE a.`intersect` END ) /a.`union`) AS jaccard_index 
 FROM (
SELECT q.* ,(q.sets + q.parisset) AS `union` , 
(q.sets - q.parisset) AS `intersect`
FROM (
SELECT cities.`id`, cities.`name` , GROUP_CONCAT(tag_id SEPARATOR ',') sets ,
(SELECT  GROUP_CONCAT(tag_id SEPARATOR ',')  FROM `cities_tags` WHERE city_id= 1)AS parisset

FROM `cities_tags` 
LEFT JOIN `cities` ON (cities_tags.`city_id` = cities.`id`) WHERE  cities.`id` !=1
GROUP BY city_id ) q
) a ORDER BY jaccard_index DESC

因此,结果表明巴黎与伦敦密切相关,然后与纽约密切相关

So the result shows Paris is closely related to London and then related to New York

Jaccard相似性提琴

这篇关于MySQL/PHP:通过标签/分类法查找相似/相关的项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆