找到从字符串列表匹配一个给定的字符串最好的子集 [英] find best subset from list of strings to match a given string
问题描述
我有一个字符串
S =老鼠
和字符串列表
sub_strings = [M,O,SE,E]
我要了解一下什么是sub_strings最好的和最短的匹配子集相匹配是清单。 什么是做到这一点的最好方法是什么? 理想的结果是[M,O,SE],因为他们一起拼莫斯
您可以使用常规的前pression:
进口重
DEF匹配(S,sub_strings):
sub_strings =排序(sub_strings,关键= LEN,反向=真)
模式='|'。加入(re.escape(SUBSTR)对于SUBSTR在sub_strings)
返回通过re.findall(图案,S)
这至少是短,见效快,但它并不一定能找到匹配的最佳设置;实在是太贪婪。例如,
匹配(熊,[东亚银行,是,ARS])
返回 [东亚银行]
,当它应该返回 [是,ARS]
。
的code说明:
-
的第一行由长度排序子串,以使最长字符串出现在列表的开头。这可以确保正规前pression将preFER不再匹配较短的。
-
第二行创建一个普通的前pression模式包含所有子的,由
|隔开
符号,意思是或<。 / P> -
第三行只使用了
通过re.findall
函数查找的特定字符串中的模式的所有匹配取值
。
I have a string
s = "mouse"
and a list of string
sub_strings = ["m", "o", "se", "e"]
I need to find out what is the best and shortest matching subset of sub_strings the list that matches s. What is the best way to do this? The ideal result would be ["m", "o", "se"] since together they spell mose
You can use a regular expression:
import re
def matches(s, sub_strings):
sub_strings = sorted(sub_strings, key=len, reverse=True)
pattern = '|'.join(re.escape(substr) for substr in sub_strings)
return re.findall(pattern, s)
This is at least short and quick, but it will not necessarily find the best set of matches; it is too greedy. For example,
matches("bears", ["bea", "be", "ars"])
returns ["bea"]
, when it should return ["be", "ars"]
.
Explanation of the code:
The first line sorts the substrings by length, so that the longest strings appear at the start of the list. This makes sure that the regular expression will prefer longer matches over shorter ones.
The second line creates a regular expression pattern consisting of all the substrings, separated by the
|
symbol, which means "or".The third line just uses the
re.findall
function to find all matches of the pattern in the given strings
.
这篇关于找到从字符串列表匹配一个给定的字符串最好的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!