搜索文本文件中字符串的最快方法 [英] Fastest way to search text file for string

查看:110
本文介绍了搜索文本文件中字符串的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在.NET中搜索大型磁盘文本文件(100 + MB)的最快*方法是什么?

给定字符串。


这些文件是未编入索引且未分类的,并且出于我当前的b $ b b要求的目的,无法编入索引/排序。


我不喜欢t想要将整个文件加载到物理内存中,内存映射文件

都可以(并且首选)。速度/性能是一项要求 - 目标是

在10 MB或更短的时间内找到100 MB文件的字符串。搜索字符串

通常为10个字符或更少。最后,我不想产生一个

外部可执行文件(例如grep),但是将算法/方法直接包含在.NET代码库的
中。对于第一次转换,不需要通配符支持。


感谢您的任何指示!

What is the *fastest* way in .NET to search large on-disk text files (100+ MB)
for a given string.

The files are unindexed and unsorted, and for the purposes of my immediate
requirements, can''t be indexed/sorted.

I don''t want to load the entire file into physical memory, memory-mapped files
are ok (and preferred). Speed/performance is a requirement -- the target is to
locate the string in 10 seconds or less for a 100 MB file. The search string
is typically 10 characters or less. Finally, I don''t want to spawn out to an
external executable (e.g. grep), but include the algorithm/method directly in
the .NET code base. For the first rev, wildcard support is not a requirement.

Thanks for any pointers!

推荐答案

我建议你看一下Regex implmentation。我认为正则表达式

是扫描速度最快的。

您可能需要使用文件流来加载文件,所以我不认为它是

最合适的答案。

无论如何都要制作其中一个文件的本地副本并试试Regex。看看

它是否接近10秒标记。


-


问候,

Hermit Dave

http:// hdave.blogspot.com

" Julie" <菊*** @ nospam.com>在消息中写道

新闻:41 *************** @ nospam.com ...
i would suggest that you have a look at Regex implmentation. I think regex
is the fastest when it comes to scanning.
You might need to use filestream to load the file so i dont think its the
most appropriate answer.
anyways make a local copy of one of those files and give Regex a try. see if
it comes anywhere near the 10 sec mark.

--

Regards,

Hermit Dave
(http://hdave.blogspot.com)
"Julie" <ju***@nospam.com> wrote in message
news:41***************@nospam.com...
什么是*最快*在.NET中搜索给定字符串的大型磁盘文本文件(100 +
MB)。

文件未编入索引且未排序,并且用于我的直接目的
要求,不能编入索引/排序。

我不想将整个文件加载到物理内存中,内存映射
文件可以(并且首选) 。速度/性能是一项要求 - 目标
是为100 MB文件在10秒或更短时间内找到字符串。搜索
字符串通常为10个字符或更少。最后,我不想产生外部可执行文件(例如grep)
,但是在.NET代码库中直接包含算法/方法
。对于第一个版本,通配符支持不是
的要求。
感谢您的任何指示!
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string.

The files are unindexed and unsorted, and for the purposes of my immediate
requirements, can''t be indexed/sorted.

I don''t want to load the entire file into physical memory, memory-mapped files are ok (and preferred). Speed/performance is a requirement -- the target is to locate the string in 10 seconds or less for a 100 MB file. The search string is typically 10 characters or less. Finally, I don''t want to spawn out to an external executable (e.g. grep), but include the algorithm/method directly in the .NET code base. For the first rev, wildcard support is not a requirement.
Thanks for any pointers!



我不会花钱再看看*如果*你可以这样做,直到你找到

out *为什么*你必须这样做!


100mb平面文件?这正是为什么关系数据库是b
的原因,并且仍然用于几乎所有事情。在不知道更多关于你的应用程序的情况下,我宁愿花2分钟将其加载到SQL表中,

构建一个索引 - 然后你想要什么做,突然变得快速

(亚秒级),简单并且稍后会支持通配符。也许晚上大量加载你的

文件 - 让你的前端在白天打到数据库?


我不认为你会对任何解决方案都很满意。每一个回复

你要达到这个要么是慢慢的方式,要么是b $ b复杂的。你正在重新发明轮子!!


我的
I wouldn''t spend anymore time on see *if* you can do this, until you find
out *why* you have to do this!

100mb flat file?? This is exactly the reason why relational databases were
made and are still used for just about everything. Without knowing more
about your app, I''d rather take the 2 minutes to load this into a SQL table,
build an index - and then what you want to do, suddenly becomes quick
(sub-second), simple and will support wildcards later. Maybe bulk-load your
file at night - and have your front-end hit the database during the day?

I don''t think you will be happy with just about any solution. Every response
you will get to this is either going to be way to slow -or- way too
complicated. You''re re-inventing the wheel!!

My


.02


" ;朱莉" <菊*** @ nospam.com>在消息中写道

新闻:41 *************** @ nospam.com ...
.02

"Julie" <ju***@nospam.com> wrote in message
news:41***************@nospam.com...
什么是*最快*在.NET中搜索给定字符串的大型磁盘文本文件(100 +
MB)

这些文件是无索引和未分类的,并且用于我的直接
要求,无法编入索引/排序。

我不想将整个文件加载到物理内存,内存映射
文件中/>是好的(也是首选)。速度/性能是一项要求 - 目标是在10 MB或更短的时间内为100 MB文件定位字符串。搜索
字符串
通常为10个字符或更少。最后,我不想产生
外部可执行文件(例如grep),但在.NET代码库中直接包含算法/方法
。对于第一次转换,通配符支持不是
要求。

感谢您的任何指示!
What is the *fastest* way in .NET to search large on-disk text files (100+
MB)
for a given string.

The files are unindexed and unsorted, and for the purposes of my immediate
requirements, can''t be indexed/sorted.

I don''t want to load the entire file into physical memory, memory-mapped
files
are ok (and preferred). Speed/performance is a requirement -- the target
is to
locate the string in 10 seconds or less for a 100 MB file. The search
string
is typically 10 characters or less. Finally, I don''t want to spawn out to
an
external executable (e.g. grep), but include the algorithm/method directly
in
the .NET code base. For the first rev, wildcard support is not a
requirement.

Thanks for any pointers!



这篇关于搜索文本文件中字符串的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆