比较数据集并返回最佳匹配 [英] compare data sets and return best match

查看:82
本文介绍了比较数据集并返回最佳匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在mysql中,我使用联接表"将标签分配给项目.我想查看哪些项目与正在查看的项目具有最相似的标签.

In mysql, I am using "join tables" to assign tags to items. I would like to see which items have the most similar tags to the item being viewed.

例如,假设感兴趣的项目已被标记为酷",汽车"和红色".我想搜索带有这些标签的其他物品.我想查看标记为汽车"的项目,但我希望标记为汽车"和红色"的项目位于仅标记为汽车"的项目上方.我希望带有相同标签的项目位于结果的顶部.

For example, let's say the item of interest has been tagged "cool", "cars" and "red". I would like to search for other items with those tags. I want to see items that have been tagged "cars", but I want items that were tagged "cars" and "red" to be above the item that was only tagged "cars". I want items with identical tags to be at the top of the results.

是否有某种方法可以使用IN将数据集(子查询)与另一个数据集(子查询)进行比较?或者,是否有一些技巧可以使用GROUP BY和GROUP_CONCAT()将它们评估为逗号分隔的列表?

Is there some way to compare a data set (subquery) with another data set (subquery) using IN? Alternately, is there some trick I can use to evaluate them as comma separated lists using GROUP BY and GROUP_CONCAT()?

推荐答案

如果您向我们展示您的表结构,这会有所帮助,所以我可以更具体一些.

It would help if you show us your table structures, so I can be more specific.

我假设您具有类似以下的结构:

I'm assuming you've got a structure that resembles this:

Table item: (id, itemname)
1 item1
2 item2
3 item3
4 item4
5 item5

Table tag: (id, tagname)
1 cool
2 red
3 car

Table itemtag: (id, itemid, tagid)
1 1 2 (=item1, red)
2 2 1 (=item2, cool)
3 2 3 (=item2, car)
4 3 1 (=item3, cool)
5 3 2 (=item3, red)
6 3 3 (=item3, car)
7 4 3 (=item3, car)
8 5 3 (=item3, car)

通常,我的方法是从对每个单独的标签进行计数开始.

In general my approach would be to start out by counting each separate tag.

-- make a list of how often a tag was used:
select tagid, count(*) as `tagscore` from itemtag group by tagid

这将显示分配给该商品的每个标签的一行,并带有一个分数.

This shows a row for each tag that was assigned to the item, with a score.

在我们的示例中,将是:

In our example, that would be:

tag  tagscore
1    2         (cool, 2x)
2    2         (red, 2x)
3    4         (car, 4x)


set @ItemOfInterest=2;

select
  itemname,
  sum(tagscore) as `totaltagscore`,
  GROUP_CONCAT(tags) as `tags`
from
  itemtag
join item on itemtag.itemid=item.id

join
  /* join the query from above (scores per tag) */
  (select tagid, count(*) as `tagscore` from itemtag group by tagid ) as `TagScores`
  on `TagScores`.tagid=itemtag.tagid
where
  itemid<>@ItemOfInterest and 
  /* get the taglist of the current item */
  tagid in (select distinct tagid from itemtag where itemid=@ItemOfInterest)
group by
  itemid
order by
  2 desc

说明: 该查询具有2个子查询: 一种是从感兴趣的项目中获取列表标签.我们只想和那些人一起工作. 另一个子查询会生成每个标签的分数列表.

Explanation: The query has 2 subqueries: One is to obtain the list tags from the item of interest. We only want to work with those. The other subquery generates a list of scores per tag.

因此,最后,数据库中的每个项目都有一个标签分数列表.这些分数与sum(tagscore)相加,然后使用该数字对结果进行排序(最高分数排在最前面).

So in the end, each item in the database has a list of tag scores. Those scores are added up with sum(tagscore), and that number is used to order the result (highest scores on top).

为了显示可用标签的列表,我使用了GROUP_CONCAT.

To show a list of available tags, I've used GROUP_CONCAT.

查询将导致类似这样的事情(我已经在此处制作了实际数据):

The query will result in something like this (I've made the actual data up here):

Item   TagsScore   Tags
item3  15          red,cool,car
item4   7          red,car
item5   7          red
item1   5          car
item6   5          car 

这篇关于比较数据集并返回最佳匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆