什么是做一个大文件搜索的最佳方式? [英] What is the best way to do a search in a large file?

查看:139
本文介绍了什么是做一个大文件搜索的最佳方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我期待应用KMP(或类似)搜索到一个大文件(> 4GB)。

I'm looking to apply a KMP (or similar) search to a large file (> 4GB).

我期待这给我的问题though.I不能一切复制到内存,因为没有足够的空间存在。

I'm expecting this to give me problems though.I can't copy it all to memory because there isn't enough space there.

我的问题是,什么是去这样做搜索的最佳方式?我应该简单地创建一个文件*,并在文件中直接进行搜索,我应该复制块(比如4K)内存和搜查,还是其他什么东西完全?

My question is, what is the best way to go about doing this search? Should I simply create a FILE* and do the search directly in the file, should I copy blocks (say 4k) to memory and search those, or something else completely?

推荐答案

如果您使用的是支持它的平台,您可以使用mmap()。
该文件的分页也是一种可能性,但要记住,以保持缓冲器尽可能大,以减少在IO开销,并为两页(假设一个字符串匹配,而是由页边界分裂)边界之间小心

If you are using a platform that supports it, you can use mmap(). Pagination of the file is also a possibility, but remember to keep the buffer as large as possible to reduce the IO overhead, and to be careful between boundaries of two pages (suppose a string is matching, but is splitted by the page boundary)

另外,我建议你建立某种类型的索引,使用索引来限制搜索。 KMP搜索是不是特别有效。当然,这取决于你的文件的性质,创建方式,

Alternatively, I suggest you to build an index of some sort, and use the index to restrict the search. KMP search is not particularly efficient. This of course depends on the nature of your file, how it gets created, etc.

这篇关于什么是做一个大文件搜索的最佳方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆