大文本文件中最快的文本搜索方法 [英] Fastest Text search method in a large text file

查看:133
本文介绍了大文本文件中最快的文本搜索方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在一个相当大的txt文件(100k行,7mo)中进行文本搜索 文字不是很大,但我需要进行大量搜索. 我想查找目标字符串并返回出现的行. 我的文本文件已格式化,因此目标只能出现在一行中.

I am doing a text search in a rather big txt file (100k lines, 7mo) Text is not that big but I need a lot of searches. I want to look for a target string and return the line where it appears. My text file is formatted so that the target can only appear in one line.

最有效的方法是什么?我做了很多搜索,所以我想提高速度. 现在是mycode:

What is the most efficient way? I do a lot of searches so I want to improve speed. Here is mycode right now:

def lookup_line(target):
    #returns line of the target, or None if doesnt exist
    line=None
    dir=os.path.dirname(__file__)
    path=dir+'/file.txt'
    file=open(path,'r')
    while line==None:
        l=file.readline()
        l=unicode(l,'utf-8')
        if target in l:
            break
        if l=='': break #happens at end of file, then stop loop
    line=l
    if line=='':line=None #end of file, nothing has been found
    file.close()
    return line

我将这个python代码用于Google Appengine应用.

I use this python code for a google Appengine app.

谢谢!

推荐答案

  1. 立即将整个文本加载到RAM中.不要逐行阅读.
  2. 在斑点中搜索图案.如果找到它,请使用text.count('\n',0,pos)获取行号.
  3. 如果不需要行号,请查找上一个和下一个EOL,以将行从文本中切出.
  1. Load the whole text in RAM at once. Don't read line by line.
  2. Search for the pattern in the blob. If you find it, use text.count('\n',0,pos) to get the line number.
  3. If you don't need the line number, look for the previous and next EOL to cut the line out of the text.

Python中的循环很慢.字符串搜索非常快.如果需要查找多个字符串,请使用正则表达式.

The loop in Python is slow. String searching is very fast. If you need to look for several strings, use regular expressions.

如果这还不够快,请使用grep这样的外部程序.

If that's not fast enough, use an external program like grep.

这篇关于大文本文件中最快的文本搜索方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆