字比较算法 [英] Word comparison algorithm

查看:191
本文介绍了字比较算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为我正在处理的项目执行CSV导入工具。
客户端需要能够输入excel中的数据,将它们导出为CSV并将其上传到数据库。
例如,我有这个CSV记录:

I am doing a CSV Import tool for the project I'm working on. The client needs to be able to enter the data in excel, export them as CSV and upload them to the database. For example I have this CSV record:

   1,   John Doe,     ACME Comapny   (the typo is on purpose)

当然,这些公司保存在一个单独的表中,并与外键关联,在插入之前发现正确的公司ID。
我计划通过比较数据库中的公司名称和CSV中的公司名称来做到这一点。
如果字符串完全相同,比较应该返回0,并返回一些随着字符串变得更加不同而变大的值,但是strcmp不会在这里剪切,因为:

Of course, the companies are kept in a separate table and linked with a foreign key, so I need to discover the correct company ID before inserting. I plan to do this by comparing the company names in the database with the company names in the CSV. the comparison should return 0 if the strings are exactly the same, and return some value that gets bigger as the strings get more different, but strcmp doesn't cut it here because:

Acme Company和Acme Comapny应该有非常小的差异指数,但
Acme Company和Cmea Mpnyaco应该有非常大的差异指数
或 Acme Company和Acme Comp。。也应该具有小的差异指数,即使字符计数不同。
此外,Acme Company和Company Acme应该返回0。

"Acme Company" and "Acme Comapny" should have a very small difference index, but "Acme Company" and "Cmea Mpnyaco" should have a very big difference index Or "Acme Company" and "Acme Comp." should also have a small difference index, even though the character count is different. Also, "Acme Company" and "Company Acme" should return 0.

因此,如果客户输入数据时输入类型,选择他最想插入的名称。

So if the client makes a type while entering data, i could prompt him to choose the name he most probably wanted to insert.

有一个已知的算法来做,或者我们可以发明一个:)

Is there a known algorithm to do this, or maybe we can invent one :) ?

推荐答案

您可能想查看 Levenshtein Distance 算法作为起点。它会评估两个字之间的距离。

You might want to check out the Levenshtein Distance algorithm as a starting point. It will rate the "distance" between two words.

这个SO线程实现一个谷歌风格的你的意思是...?系统也可以提供一些想法。

This SO thread on implementing a Google-style "Do you mean...?" system may provide some ideas as well.

这篇关于字比较算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆