字符串相似性算法? [英] String similarity algorithms?
问题描述
我需要比较2个字符串并计算它们的相似性,以过滤出最相似的字符串列表。
例如。搜索dog会返回
- 狗
- 狗狗
- bog
- 雾
- 雾
>例如。搜索crack会返回
- 破解
- wisecrack
- 机架
- $ b >我遇到了:
- dog
- doggone
- bog
- fog
- foggy
- crack
- wisecrack
- rack
- jack
- quack
你知道更多的字符串相似性算法吗?
看来你需要某种模糊匹配。这里是一些相似度指标的实现。 http://www.dcs.shef.ac.uk/~sam/stringmetrics.html 。以下是字符串指标的详细说明 http:// www .cs.cmu.edu /〜wcohen / postscript / ijcai-ws-2003.pdf 这取决于模糊和你的实现必须有多快。
I need to compare 2 strings and calculate their similarity, to filter down a list of the most similar strings.
Eg. searching for "dog" would return
Eg. searching for "crack" would return
I have come across:
Do you know of any more string similarity algorithms?
It seems you are needing some kind of fuzzy matching. Here is java implementation of some set of similarity metrics http://www.dcs.shef.ac.uk/~sam/stringmetrics.html. Here is more detailed explanation of string metrics http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf it depends on how fuzzy and how fast your implementation must be.
这篇关于字符串相似性算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!