产品名称的模糊匹配 [英] Fuzzy matching of product names

查看：45 发布时间：2021/8/31 18:42:53 string-matching levenshtein-distance fuzzy-search

本文介绍了产品名称的模糊匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要自动将来自不同来源的产品名称(相机、笔记本电脑、电视等)与数据库中的规范名称进行匹配.

I need to automatically match product names (cameras, laptops, tv-s etc) that come from different sources to a canonical name in the database.

例如Canon PowerShot a20IS"、来自佳能的NEW powershot A20 IS"和数码相机Canon PS A20IS"应该都匹配Canon PowerShot A20 IS".我已经使用了 levenshtein distance 并添加了一些启发式方法(删除明显的常用词，为数字更改分配更高的成本等)，这在一定程度上有效，但不幸的是还不够好.

For example "Canon PowerShot a20IS", "NEW powershot A20 IS from Canon" and "Digital Camera Canon PS A20IS" should all match "Canon PowerShot A20 IS". I've worked with levenshtein distance with some added heuristics (removing obvious common words, assigning higher cost to number changes etc), which works to some extent, but not well enough unfortunately.

主要问题是，即使相关关键字中的单个字母更改也会产生巨大差异，但要检测哪些是相关关键字并不容易.例如，考虑三个产品名称:
联想 T400
联想 R400
全新联想 T-400，Core 2 Duo
根据任何标准，前两个都是非常相似的字符串(好吧，在这种情况下，soundex 可能有助于区分 T 和 R，但名称也可能是 400T 和 400R)，第一个和第三个彼此相距很远，因为字符串，但都是同一个产品.

The main problem is that even single-letter changes in relevant keywords can make a huge difference, but it's not easy to detect which are the relevant keywords. Consider for example three product names:
Lenovo T400
Lenovo R400
New Lenovo T-400, Core 2 Duo
The first two are ridiculously similar strings by any standard (ok, soundex might help to disinguish the T and R in this case, but the names might as well be 400T and 400R), the first and the third are quite far from each other as strings, but are the same product.

显然，匹配算法不可能是 100% 精确的，我的目标是自动匹配大约 80% 的名字并具有很高的置信度.

Obviously, the matching algorithm cannot be a 100% precise, my goal is to automatically match around 80% of the names with a high confidence.

非常感谢任何想法或参考

Any ideas or references is much appreciated

产品名称的模糊匹配 [英] Fuzzy matching of product names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

产品名称的模糊匹配 [英] Fuzzy matching of product names

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭