在一个字符串中查找多个子字符串，而无需对其进行多次遍历 [英] Finding multiple substrings in a string without iterating over it multiple times

查看：67 发布时间：2020/11/11 1:44:29 python string full-text-search

本文介绍了在一个字符串中查找多个子字符串，而无需对其进行多次遍历的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要查找列表中的项目是否出现在字符串中，然后将其添加到其他列表中.这段代码有效:

I need to find if items from a list appear in a string, and then add the items to a different list. This code works:

data =[]
line = 'akhgvfalfhda.dhgfa.lidhfalihflaih**Thing1**aoufgyafkugafkjhafkjhflahfklh**Thing2**dlfkhalfhafli...'
_legal = ['thing1', 'thing2', 'thing3', 'thing4',...] 
for i in _legal:
    if i in line:
        data.append(i)

但是，代码多次遍历line(可能很长)-与_legal中的项目(可能是 lot )一样多.这对我来说太慢了，我正在寻找一种更快的方法. line没有任何特定的格式，据我所知，使用.split()无效. 更改了line，以便更好地表示问题.

However, the code iterates over line (which could be long) multiple times- as many times as there are item in _legal (which could be a lot). That's too slow for me, and I'm searching for a way to do it faster. line doesn't have any specific format, so using .split() couldn't work, as far as I know. changed line so that it better represents the problems.

推荐答案

我想改善的一种方法是:

One way I could think of to improve is:

获取_legal
使用滑动窗口技术从这些特定长度的line中构建单词词典.复杂度应该为O( len(line)*num_of_unique_lengths )，应该比蛮力好.
现在在O(1)中的字典中查找每个thing.

Get all unique lengths of the words in _legal
Build a dictionary of words from line of those particular lengths using a sliding window technique. The complexity should be O( len(line)*num_of_unique_lengths ), this should be better than brute force.
Now look for each thing in the dictionary in O(1).

代码:

line = 'thing1 thing2 456 xxualt542l lthin. dfjladjfj lauthina '
_legal = ['thing1', 'thing2', 'thing3', 'thing4', 't5', '5', 'fj la']
ul = {len(i) for i in _legal}
s=set()
for l in ul:
    s = s.union({line[i:i+l] for i in range(len(line)-l)})
print(s.intersection(set(_legal)))

输出:

{'thing1', 'fj la', 'thing2', 't5', '5'}

这篇关于在一个字符串中查找多个子字符串，而无需对其进行多次遍历的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在一个字符串中查找多个子字符串，而无需对其进行多次遍历 [英] Finding multiple substrings in a string without iterating over it multiple times

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在一个字符串中查找多个子字符串，而无需对其进行多次遍历 [英] Finding multiple substrings in a string without iterating over it multiple times

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭