标签/关键字匹配的最佳解决方案是什么? [英] What's the optimal solution for tag/keyword matching?

查看:139
本文介绍了标签/关键字匹配的最佳解决方案是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找数据库中不同记录之间关键字匹配的最佳解决方案.这是一个经典问题,我发现了类似的问题,但具体没有.

I'm looking for the optimal solution for keyword matching between different records in the database. It's a classic problem, I've found similar questions, but nothing concretely.

我已经完成了全文搜索,联接和子查询,临时表...,所以我真的很想看看你们如何解决这样一个普遍的问题.

I've done it with full text searches, joins and subqueries, temp tables, ... so i'd really like to see how you guys are solving such a common problem.

因此,假设我有两个表; ProductsKeywords,它们以经典的多对多关系与第三个表Products_Keywords链接.

So, let's say I have two tables; Products and Keywords and they are linked with the third table, Products_Keywords in a classic many-to-many relationship.

如果我在页面上显示一个Product记录,并且想要显示与 n 个相关的热门产品,那么最佳选择是什么?

If I show one Product record on the page and would like to show top n related products, what would be the best option?

我们应该考虑到记录可能共享多个关键字,这一事实应确定最相关产品的顺序.

We should take into account that records might share several keywords and this fact should determine the ordering of the top related product.

我也对其他想法持开放态度,但由于性能原因,T-SQL将是更可取的解决方案.

I'm open for other ideas as well, but T-SQL would be preferable solution due to the performance reasons.

推荐答案

我的第一枪是这样的:

SELECT
    P.product_id,
    COUNT(*)
FROM
    Product_Keywords PK1
INNER JOIN Product_Keywords PK2 ON
    PK2.keyword_id = PK1.keyword_id
INNER JOIN Products P ON
    P.product_id = PK.product_id
WHERE
    PK1.product_id = @product_id
GROUP BY
    P.product_id
ORDER BY
    COUNT(*) DESC

Product_Keywords与Product_Keywords(从PK2到PK1)的连接可能很粗糙,所以我不能说性能.我将从这里开始,然后再进行优化.

The join of Product_Keywords to Product_Keywords (PK2 to PK1) might be rough, so I can't speak to performance. This is where I would start though and then look at optimization.

作为对Assaf评论的补充,要考虑的一件事是,您可以在Product_Keywords和SUM(PK1.weight)+ SUM(PK2.weight)中添加权重"以进行排名.只是一个想法.

One thing to consider, as a follow-up to Assaf's comment, is that you could add a "weight" to the Product_Keywords and SUM(PK1.weight) + SUM(PK2.weight) for ranking. Just a thought.

要详细说明权重...,您可能会决定要允许对关键字进行加权.但是,用于确定权重的实际方法将是业务决策,因此我在这里不能给您太多指导.

To elaborate on the weighting... you may decide that you want to allow keywords to be weighted. The actual method used to determine the weighting would be a business decision though, so I can't really give you too much guidance there.

不过,作为示例,这个问题是关于编程",关键字匹配"和"SQL"的.编程是相当通用的,因此,如果两个问题有共同点,那么它可能仍然不意味着它们之间具有相关性,因此也许您仅将其加权为1.SQL更具体一些,因此可以将其加权为5.关键字匹配是这个问题的主要重点,而且非常具体,因此您可以使用10来加权.

As an example though, this question is about "programming", "keyword matching", and "SQL". Programming is pretty generic, so if two questions had that in common it still might not mean that they are that related so maybe you only weight it as 1. SQL is a little more specific, so that you might weight as a 5. Keyword matching is both the main focus of the question AND it's pretty specific, so you might weight that with a 10.

这只是一个例子,正如我所说,权重的确切确定以及您的评分方式取决于特定的业务.您可能会认为匹配关键字的数量比权重更为重要,因此权重可能仅用作决胜局,等等.

This is just an example of course and as I said, the exact determination of the weights as well as how you score it are dependent on the specific business. You might decide that matching the number of keywords is more important than the weights so maybe the weighting is only used as a tie-breaker, etc. HTH.

这篇关于标签/关键字匹配的最佳解决方案是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆