找到从字符串列表匹配一个给定的字符串最好的子集 [英] find best subset from list of strings to match a given string

查看：558 发布时间：2015/11/30 20:52:07 python string algorithm matching fuzzy-search

本文介绍了找到从字符串列表匹配一个给定的字符串最好的子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个字符串

  S =老鼠

和字符串列表

  sub_strings = [M，O，SE，E]

我要了解一下什么是sub_strings最好的和最短的匹配子集相匹配是清单。什么是做到这一点的最好方法是什么？理想的结果是[M，O，SE]，因为他们一起拼莫斯

解决方案

您可以使用常规的前pression：

 进口重

DEF匹配（S，sub_strings）：
    sub_strings =排序（sub_strings，关键= LEN，反向=真）
    模式='|'。加入（re.escape（SUBSTR）对于SUBSTR在sub_strings）
    返回通过re.findall（图案，S）

这至少是短，见效快，但它并不一定能找到匹配的最佳设置;实在是太贪婪。例如，

 匹配（熊，[东亚银行，是，ARS]）

返回 [东亚银行] ，当它应该返回 [是，ARS] 。

的code说明：

的第一行由长度排序子串，以使最长字符串出现在列表的开头。这可以确保正规前pression将preFER不再匹配较短的。
第二行创建一个普通的前pression模式包含所有子的，由 |隔开符号，意思是或<。 / P>
第三行只使用了通过re.findall 函数查找的特定字符串中的模式的所有匹配取值。

I have a string

s = "mouse"

and a list of string

sub_strings = ["m", "o", "se", "e"]

I need to find out what is the best and shortest matching subset of sub_strings the list that matches s. What is the best way to do this? The ideal result would be ["m", "o", "se"] since together they spell mose

解决方案

You can use a regular expression:

import re

def matches(s, sub_strings):
    sub_strings = sorted(sub_strings, key=len, reverse=True)
    pattern = '|'.join(re.escape(substr) for substr in sub_strings)
    return re.findall(pattern, s)

This is at least short and quick, but it will not necessarily find the best set of matches; it is too greedy. For example,

matches("bears", ["bea", "be", "ars"])

returns ["bea"], when it should return ["be", "ars"].

Explanation of the code:

The first line sorts the substrings by length, so that the longest strings appear at the start of the list. This makes sure that the regular expression will prefer longer matches over shorter ones.
The second line creates a regular expression pattern consisting of all the substrings, separated by the | symbol, which means "or".
The third line just uses the re.findall function to find all matches of the pattern in the given string s.

这篇关于找到从字符串列表匹配一个给定的字符串最好的子集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

找到从字符串列表匹配一个给定的字符串最好的子集 [英] find best subset from list of strings to match a given string

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

找到从字符串列表匹配一个给定的字符串最好的子集 [英] find best subset from list of strings to match a given string

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭