匹配一个大的文本文件中的字符串? [英] Matching a string in a Large text file?
问题描述
我有一个包含大小152MB的文本文件,约7万件的字符串列表。我想知道这可能是实现一个函数,接受一个字符串,并返回无论是串的名单最好的办法。
I have a list of strings containing about 7 million items in a text file of size 152MB. I was wondering what could be best way to implement the a function that takes a single string and returns whether it is in that list of strings.
推荐答案
您将不得不匹配这个文本文件几次?如果是这样,我想创建一个的HashSet<串GT;
。否则,只是读它一行行(我假设有每行一个字符串),看看它是否匹配。
Are you going to have to match against this text file several times? If so, I'd create a HashSet<string>
. Otherwise, just read it line by line (I'm assuming there's one string per line) and see whether it matches.
ASCII的152MB将最终成为超过300MB在内存Unicode数据 - 但在现代的机器有足够的内存,因此保持了一大堆在的HashSet<串>
将重复查找确实非常快。
152MB of ASCII will end up as over 300MB of Unicode data in memory - but in modern machines have plenty of memory, so keeping the whole lot in a HashSet<string>
will make repeated lookups very fast indeed.
绝对的简单的方式做到这一点可能是使用 File.ReadAllLines
,虽然这将创建一个数组,那么这将被丢弃 - 而不是伟大的内存使用情况,但可能不会太糟糕了:
The absolute simplest way to do this is probably to use File.ReadAllLines
, although that will create an array which will then be discarded - not great for memory usage, but probably not too bad:
HashSet<string> strings = new HashSet<string>(File.ReadAllLines("data.txt"));
...
if (strings.Contains(stringToCheck))
{
...
}
这篇关于匹配一个大的文本文件中的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!