如何在python中生成一组相似的字符串 [英] how to generate a set of similar strings in python

查看：258 发布时间：2020/5/4 10:06:14 python string python-3.x machine-learning string-matching

本文介绍了如何在python中生成一组相似的字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道如何基于Levenshtein distance(字符串编辑距离)生成一组相似的字符串.理想情况下，我喜欢传递源字符串(即用于生成与其相似的其他字符串的字符串)，需要生成的字符串数和阈值作为参数，即，生成的集合应大于阈值.我想知道应该使用什么Python软件包?或任何想法如何实现这一目标?

I am wondering how to generate a set of similar strings based on Levenshtein distance (string edit distance). Ideally, I like to pass in, a source string (i.e. a string which is used to generate other strings that are similar to it), the number of strings need to be generated and a threshold as parameters, i.e. similarities among the strings in the generated set should be greater than the threshold. I am wondering what Python package(s) should I use to achieve that? Or any idea how to implement this?

推荐答案

我认为您可以用另一种方式来思考问题(反向).

I think you can think of the problem in another way (reversed).

给出一个字符串，说它是 sittin .
给出一个阈值(编辑距离)，说它是k.
然后您以k个步骤应用不同编辑"的组合.

Given a string, say it is sittin.
Given a threshold (edit distance), say it is k.
Then you apply combinations of different "edits" in k-steps.

例如，假设k =2.并假设您拥有允许的编辑模式是:

For example, let's say k = 2. And assume the allowed edit modes you have are:

删除一个字符
添加一个字符
用一个字符替换另一个字符.

然后逻辑如下:

input = 'sittin'
for num in 1 ... n:  # suppose you want to have n strings generated
  my_input_ = input
  # suppose the edit distance should be smaller or equal to k;
  # but greater or equal to one
  for i in in 1 ... randint(k): 
    pick a random edit mode from (delete, add, substitute)
    do it! and update my_input_

如果您需要使用预定义的字典，那么这会增加一些复杂性，但是仍然可以实现.在这种情况下，编辑必须有效.

If you need to stick with a pre-defined dictionary, that adds some complexity but it is still doable. In this case, the edit must be valid.

这篇关于如何在python中生成一组相似的字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在python中生成一组相似的字符串 [英] how to generate a set of similar strings in python

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何在python中生成一组相似的字符串 [英] how to generate a set of similar strings in python

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭