Aproximative字符串匹配 [英] Aproximative string matching

查看：78 发布时间：2019/6/6 15:28:52 python

本文介绍了Aproximative字符串匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一个进行近似字符串匹配的库，例如
，在字典中搜索单词motorcycle，但

返回类似字符串如motorcicle。

是否有这样的库？

解决方案

该算法称为soundex。这是一个实现示例。

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

这里是另一个：
http://effbot.org/librarybook/soundex.htm

此算法称为soundex。这是一个实现示例。

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

这里是另一个：
http://effbot.org/librarybook/soundex.htm

el*******@hotmail.com 写道：

此算法称为soundex。这是一个实现示例。

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

这是另一个：
http://effbot.org/librarybook/soundex.htm

Soundex是* * *特定算法，用于近似

字符串匹配。它针对匹配

英美名字（如Smith / Smythe）进行了优化，并且

被认为是相当陈旧和过时的除了

最琐碎的应用程序 - 或者我是这么说的。

Soundex不会匹配任意更改 - 它将

匹配cat和cet，但它不匹配猫和垫。

更复杂的近似字符串匹配

算法将使用Levenshtein距离。你可以在这里找到一个无用的实现：

http://www.uselesspython.com/download.php?script_id=108

给定函数levenshtein（s1，s2）返回

两个字符串之间的距离，你可以用它来支付这样的近似匹配：

def approx_matching（strlist，target，dist = 1）：

"""匹配strlist中的大约字符串到

a目标字符串。

返回一个列表字符串，其中每个字符串

匹配不超过目标的编辑距离

dist。

"""

找到= []

for s strlist：

if levenshtein（s，target）< = dist：

found.append（s）

返回s

-

史蒂文。

I''m searching for a library which makes aproximative string matching,
for example, searching in a dictionary the word "motorcycle", but
returns similar strings like "motorcicle".

Is there such a library?

解决方案

This algorithm is called soundex. Here is one implementation example.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

here is another:
http://effbot.org/librarybook/soundex.htm

el*******@hotmail.com wrote:

This algorithm is called soundex. Here is one implementation example.

http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

here is another:
http://effbot.org/librarybook/soundex.htm

Soundex is *one* particular algorithm for approximate
string matching. It is optimised for matching
Anglo-American names (like Smith/Smythe), and is
considered to be quite old and obsolete for all but the
most trivial applications -- or so I''m told.

Soundex will not match arbitrary changes -- it will
match both cat and cet, but it won''t match cat and mat.

A more sophisticated approximate string matching
algorithm will use the Levenshtein distance. You can
find a Useless implementation here:

http://www.uselesspython.com/download.php?script_id=108
Given a function levenshtein(s1, s2) that returns the
distance between two strings, you could use it for
approximate matching like this:

def approx_matching(strlist, target, dist=1):
"""Matches approximately strings in strlist to
a target string.

Returns a list of strings, where each string
matched is no further than an edit distance of
dist from the target.
"""
found = []
for s in strlist:
if levenshtein(s, target) <= dist:
found.append(s)
return s

--
Steven.

这篇关于Aproximative字符串匹配的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Aproximative字符串匹配 [英] Aproximative string matching

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Aproximative字符串匹配 [英] Aproximative string matching

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭