如何在不将其全部读入内存的情况下对整个文件进行重新搜索或重新匹配? [英] How do I re.search or re.match on a whole file without reading it all into memory?

查看:43
本文介绍了如何在不将其全部读入内存的情况下对整个文件进行重新搜索或重新匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够在整个文件上运行正则表达式,但是我希望不必一次将整个文件读入内存,因为将来我可能会处理相当大的文件.有没有办法做到这一点?谢谢!

I want to be able to run a regular expression on an entire file, but I'd like to be able to not have to read the whole file into memory at once as I may be working with rather large files in the future. Is there a way to do this? Thanks!

说明:我无法逐行阅读,因为它可能跨越多行.

Clarification: I cannot read line-by-line because it can span multiple lines.

推荐答案

您可以使用mmap将文件映射到内存.然后可以像普通字符串一样访问文件内容:

You can use mmap to map the file to memory. The file contents can then be accessed like a normal string:

import re, mmap

with open('/var/log/error.log', 'r+') as f:
  data = mmap.mmap(f.fileno(), 0)
  mo = re.search('error: (.*)', data)
  if mo:
    print "found error", mo.group(1)

这也适用于大文件,文件内容根据需要从磁盘内部加载.

This also works for big files, the file content is internally loaded from disk as needed.

这篇关于如何在不将其全部读入内存的情况下对整个文件进行重新搜索或重新匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆