Python - 在字符串中查找字符串列表的出现次数 [英] Python - find occurrences of list of strings within string

查看:149
本文介绍了Python - 在字符串中查找字符串列表的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大字符串和一个搜索字符串列表,我想构建一个布尔列表,指示大字符串中是否存在每个搜索字符串.在 Python 中执行此操作的最快方法是什么?

I have a large string and a list of search strings and want to build a boolean list indicating whether or not each of the search strings exists in the large string. What is the fastest way to do this in Python?

下面是一个使用幼稚方法的玩具示例,但我认为可能有更有效的方法来做到这一点.

Below is a toy example using a naive approach, but I think it's likely there's a more efficient way of doing this.

例如下面的例子应该返回 [1, 1, 0] 因为hello"和world"都存在于测试字符串中.

e.g. the example below should return [1, 1, 0] since both "hello" and "world" exist in the test string.

def check_strings(search_list, input):
output = []
for s in search_list:
    if input.find(s) > -1:
        output.append(1)
    else:
        output.append(0)
return output

search_strings = ["hello", "world", "goodbye"]test_string = "你好世界"打印(check_strings(search_strings,test_string))

推荐答案

使用 Aho Corasick 算法的实现 (https://pypi.python.org/pypi/pyahocorasick/),它使用单次遍历字符串:

An implementation using the Aho Corasick algorithm (https://pypi.python.org/pypi/pyahocorasick/), which uses a single pass through the string:

import ahocorasick
import numpy as np

def check_strings(search_list, input):
    A = ahocorasick.Automaton()
    for idx, s in enumerate(search_list):
        A.add_word(s, (idx, s))
    A.make_automaton()

    index_list = []
    for item in A.iter(input):
        index_list.append(item[1][0])

    output_list = np.array([0] * len(search_list))
    output_list[index_list] = 1
    return output_list.tolist()

search_strings = ["hello", "world", "goodbye"]
test_string = "hello world"
print(check_strings(search_strings, test_string))

这篇关于Python - 在字符串中查找字符串列表的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆