用于模糊字符串比较的好的 Python 模块? [英] Good Python modules for fuzzy string comparison?

查看:36
本文介绍了用于模糊字符串比较的好的 Python 模块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个可以进行简单的模糊字符串比较的 Python 模块.具体来说,我想要字符串相似程度的百分比.我知道这可能是主观的,所以我希望找到一个可以进行位置比较以及最长相似字符串匹配等的库.

I'm looking for a Python module that can do simple fuzzy string comparisons. Specifically, I'd like a percentage of how similar the strings are. I know this is potentially subjective so I was hoping to find a library that can do positional comparisons as well as longest similar string matches, among other things.

基本上,我希望找到足够简单的东西来产生单个百分比,同时仍然足够可配置,以便我可以指定要进行的比较类型.

Basically, I'm hoping to find something that is simple enough to yield a single percentage while still configurable enough that I can specify what type of comparison(s) to do.

推荐答案

Levenshtein Python 扩展和 C 库.

Levenshtein Python extension and C library.

https://github.com/ztane/python-Levenshtein/

Levenshtein Python C 扩展模块包含用于快速的计算- Levenshtein(编辑)距离和编辑操作- 字符串相似度- 近似中值字符串,一般字符串平均- 字符串序列和集合相似度它支持普通字符串和 Unicode 字符串.

The Levenshtein Python C extension module contains functions for fast computation of - Levenshtein (edit) distance, and edit operations - string similarity - approximate median strings, and generally string averaging - string sequence and set similarity It supports both normal and Unicode strings.

$ pip install python-levenshtein
...
$ python
>>> import Levenshtein
>>> help(Levenshtein.ratio)
ratio(...)
    Compute similarity of two strings.

    ratio(string1, string2)

    The similarity is a number between 0 and 1, it's usually equal or
    somewhat higher than difflib.SequenceMatcher.ratio(), becuase it's
    based on real minimal edit distance.

    Examples:
    >>> ratio('Hello world!', 'Holly grail!')
    0.58333333333333337
    >>> ratio('Brian', 'Jesus')
    0.0

>>> help(Levenshtein.distance)
distance(...)
    Compute absolute Levenshtein distance of two strings.

    distance(string1, string2)

    Examples (it's hard to spell Levenshtein correctly):
    >>> distance('Levenshtein', 'Lenvinsten')
    4
    >>> distance('Levenshtein', 'Levensthein')
    2
    >>> distance('Levenshtein', 'Levenshten')
    1
    >>> distance('Levenshtein', 'Levenshtein')
    0

这篇关于用于模糊字符串比较的好的 Python 模块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆