Python字符串搜索效率 [英] Python string search efficiency

查看:55
本文介绍了Python字符串搜索效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于非常大的字符串(跨越多行),使用 Python 的内置字符串搜索或拆分大字符串(可能在 \n 上)并迭代搜索较小的字符串是否更快?

For very large strings (spanning multiple lines) is it faster to use Python's built-in string search or to split the large string (perhaps on \n) and iteratively search the smaller strings?

例如,对于非常大的字符串:

E.g., for very large strings:

for l in get_mother_of_all_strings().split('\n'):
 if 'target' in l:
   return True
return False

return 'target' in get_mother_of_all_strings()

推荐答案

可能 当然是第二个,我认为在大字符串中搜索或在小字符串中搜索很多没有任何区别.由于行较短,您可能会跳过一些字符,但拆分操作也有其成本(搜索 \n、创建 n 个不同的字符串、创建列表)并且循环在 python 中完成.

Probably Certainly the second, I don't see any difference in doing a search in a big string or many in small strings. You may skip some chars thanks to the shorter lines, but the split operation has its costs too (searching for \n, creating n different strings, creating the list) and the loop is done in python.

字符串 __contain__ 方法是用 C 实现的,因此速度明显更快.

The string __contain__ method is implemented in C and so noticeably faster.

还要考虑的是,一旦找到第一个匹配项,第二种方法就会中止,但第一种方法在开始搜索之前拆分所有字符串.

Also consider that the second method aborts as soon as the first match is found, but the first one splits all the string before even starting to search inside it.

通过一个简单的基准测试可以迅速证明这一点:

This is rapidly proven with a simple benchmark:

import timeit

prepare = """
with open('bible.txt') as fh:
    text = fh.read()
"""

presplit_prepare = """
with open('bible.txt') as fh:
    text = fh.read()
lines = text.split('\\n')
"""

longsearch = """
'hello' in text
"""

splitsearch = """
for line in text.split('\\n'):
    if 'hello' in line:
        break
"""

presplitsearch = """
for line in lines:
    if 'hello' in line:
        break
"""


benchmark = timeit.Timer(longsearch, prepare)
print "IN on big string takes:", benchmark.timeit(1000), "seconds"

benchmark = timeit.Timer(splitsearch, prepare)
print "IN on splitted string takes:", benchmark.timeit(1000), "seconds"

benchmark = timeit.Timer(presplitsearch, presplit_prepare)
print "IN on pre-splitted string takes:", benchmark.timeit(1000), "seconds"

结果是:

IN on big string takes: 4.27126097679 seconds
IN on splitted string takes: 35.9622690678 seconds
IN on pre-splitted string takes: 11.815297842 seconds

bible.txt 文件实际上圣经,我在这里找到它:http://patriot.net/~bmcgin/kjvpage.html(文字版)

The bible.txt file actually is the bible, I found it here: http://patriot.net/~bmcgin/kjvpage.html (text version)

这篇关于Python字符串搜索效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆