Python - 在字符串中查找字符串列表的出现次数 [英] Python - find occurrences of list of strings within string
问题描述
我有一个大字符串和一个搜索字符串列表,我想构建一个布尔列表,指示大字符串中是否存在每个搜索字符串.在 Python 中执行此操作的最快方法是什么?
I have a large string and a list of search strings and want to build a boolean list indicating whether or not each of the search strings exists in the large string. What is the fastest way to do this in Python?
下面是一个使用幼稚方法的玩具示例,但我认为可能有更有效的方法来做到这一点.
Below is a toy example using a naive approach, but I think it's likely there's a more efficient way of doing this.
例如下面的例子应该返回 [1, 1, 0] 因为hello"和world"都存在于测试字符串中.
e.g. the example below should return [1, 1, 0] since both "hello" and "world" exist in the test string.
def check_strings(search_list, input):
output = []
for s in search_list:
if input.find(s) > -1:
output.append(1)
else:
output.append(0)
return output
search_strings = ["hello", "world", "goodbye"]test_string = "你好世界"打印(check_strings(search_strings,test_string))
推荐答案
使用 Aho Corasick 算法的实现 (https://pypi.python.org/pypi/pyahocorasick/),它使用单次遍历字符串:
An implementation using the Aho Corasick algorithm (https://pypi.python.org/pypi/pyahocorasick/), which uses a single pass through the string:
import ahocorasick
import numpy as np
def check_strings(search_list, input):
A = ahocorasick.Automaton()
for idx, s in enumerate(search_list):
A.add_word(s, (idx, s))
A.make_automaton()
index_list = []
for item in A.iter(input):
index_list.append(item[1][0])
output_list = np.array([0] * len(search_list))
output_list[index_list] = 1
return output_list.tolist()
search_strings = ["hello", "world", "goodbye"]
test_string = "hello world"
print(check_strings(search_strings, test_string))
这篇关于Python - 在字符串中查找字符串列表的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!