匹配的字符串结尾 [英] matching end of string

查看:126
本文介绍了匹配的字符串结尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找最高效的方法,将单个字符串的末尾与预定义的字符串列表中的值相匹配.

I'm looking for the best most efficient way to match the end of a single string with a value from a predefined list of strings.
Something like

my_str='QWERTY'
my_lst=['QWE','QQQQ','TYE','YTR','TY']  

match='TY'match=['TY']

在限制下

len(my_lst)是已知的,但任意性可能会很长,可能约为30
my_lst中的元素可能具有不同的len,所以我不能每次都只检查my_str的已定义最后一部分
对于my_str以及my_lst中的匹配元素,它们可以是字符串或列表,以效率更高的一种为准(请参阅背景)
len(my_str)通常很小,不超过8个字符
in函数不起作用,因为我需要匹配完全在末尾进行.
endswith本身是没有用的,因为它只会return一个Boolean
匹配项应该始终是唯一的或[],因为my_lst中的任何元素都不会共享彼此的结尾

len(my_lst) is known but arbitrary thus could be very long, probably around 30
elements in my_lst may have different len so I can't just check a defined last portion of my_str every time
for my_str as well as the matching elements in my_lst they can be either strings or lists, whichever is more efficient (see background)
len(my_str) is mostly small, no longer than 8 characters
in function wouldn't do as I need the matching to occur exclusively at the end.
endswith is no use on it's own since it would only return a Boolean
the match should always be unique or [] as no elements in my_lst would share ending with one another

小背景可能会跳过
我从一个问题开始,如['Q','W','E','R','T','Y']这样的列表问题,在该问题中,我将有一个包含1个字符串的列表以进行匹配,并且我正在考虑将反向迭代作为[::-1]进行以检查每个候选者. br> 然后我意识到可以串联内部列表,因为它们仅包含字符串,并且对结果字符串运行相同的逻辑.
最后,我遇到了endswith字符串方法,该方法读取

little background may skip
I started with this problem as a list problem such as ['Q','W','E','R','T','Y'] where I would have a list of lists of 1 character strings for the matching and I was thinking of running a reverse iteration as [::-1] for the checking for every candidate.
Then I realized it was possible to concatenate the inner lists since they contained only strings and run the same logic on the resulting strings.
Finally I came across the endswith string method reading this question but it wasn't quite what I needed. Furthermore my problem can't be generalized to be solved with os module or similar since it's a string problem, not a pathing one.
end of background
I made my approach in this two ways

match=filter(lambda x: my_str.endswith(x), my_lst)
match=[x for x in my_lst if my_str.endswith(x)]

我成功了,但我想知道是否有某种内置的或最佳的方法来查找和返回匹配的结束值.

I succeeded but I would like to know if there is some built-in or best way to find and return the matched ending value.

谢谢.

推荐答案

以下是使用 trie 或前缀树(在这种情况下,从技术上讲是后缀树).如果我们有三个可能的后缀CACBBA,则我们的后缀树看起来像

Here's a way using a trie, or prefix tree (technically a suffix tree in this situation). If we had three potential suffixes CA, CB, and BA, our suffix tree would look like

     e
    / \
  A     B
 / \    |
B   C   C

(e是空字符串)我们从输入字符串的末尾开始并使用字符.如果我们遇到字符串的开头或不是当前节点的子元素的字符,则我们拒绝该字符串.如果到达树的叶子,那么我们接受字符串.这使我们可以更好地扩展到很多潜在的后缀.

(e is the empty string) We start at the end of the input string and consume characters. If we run across the beginning of the string or a character that is not a child of the current node, then we reject the string. If we reach a leaf of the tree, then we accept the string. This lets us scale better to very many potential suffixes.

def build_trie(suffixes):
    head = {}
    for suffix in suffixes:
        curr = head
        for c in reversed(suffix):
            if c not in curr:
                curr[c] = {}
            curr = curr[c]
    return head

def is_suffix(trie, s):
    if not trie:
        return True
    for c in reversed(s):
        try:
            trie = trie[c]
        except KeyError:
            return False
        if not trie:
            return True
    return False

trie = build_trie(['QWE','QQQQ','TYE','YTR','TY'])

赋予我们

{'E': {'W': {'Q': {}}, 
       'Y': {'T': {}}},
 'Q': {'Q': {'Q': {'Q': {}}}},
 'R': {'T': {'Y': {}}},
 'Y': {'T': {}}}

如果您想返回匹配的后缀,那只是跟踪我们在下降Trie时看到的字符的问题.

If you want to return the matching suffix, that's just a matter of tracking the characters we see as we descend the trie.

def has_suffix(trie, s):
    if not trie:
        return ''
    letters = []
    for c in reversed(s):
        try:
            trie = trie[c]
            letters.append(c)
        except KeyError:
            return None
        if not trie:
            return ''.join(letters)
    return None

值得注意的是,build_trie([''])build_trie([])都可以达到空的特里字符串,并且匹配所有字符串末尾的空字符串.为了避免这种情况,您可以检查suffixes的长度并返回一些非dict值,您可以在has_suffix

It's worth noting that the empty trie can be reached by both build_trie(['']) and build_trie([]), and matches the empty string at the end of all strings. To avoid this, you could check the length of suffixes and return some non-dict value, which you would check against in has_suffix

这篇关于匹配的字符串结尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆