匹配一个大的文本文件中的字符串? [英] Matching a string in a Large text file?

查看:238
本文介绍了匹配一个大的文本文件中的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大小152MB的文本文件,约7万件的字符串列表。我想知道这可能是实现一个函数,接受一个字符串,并返回无论是串的名单最好的办法。

I have a list of strings containing about 7 million items in a text file of size 152MB. I was wondering what could be best way to implement the a function that takes a single string and returns whether it is in that list of strings.

推荐答案

您将不得不匹配这个文本文件几次?如果是这样,我想创建一个的HashSet<串GT; 。否则,只是读它一行行(我假设有每行一个字符串),看看它是否匹配。

Are you going to have to match against this text file several times? If so, I'd create a HashSet<string>. Otherwise, just read it line by line (I'm assuming there's one string per line) and see whether it matches.

ASCII的152MB将最终成为超过300MB在内存Unicode数据 - 但在现代的机器有足够的内存,因此保持了一大堆在的HashSet<串> 将重复查找确实非常快。

152MB of ASCII will end up as over 300MB of Unicode data in memory - but in modern machines have plenty of memory, so keeping the whole lot in a HashSet<string> will make repeated lookups very fast indeed.

绝对的简单的方式做到这一点可能是使用 File.ReadAllLines ,虽然这将创建一个数组,那么这将被丢弃 - 而不是伟大的内存使用情况,但可能不会太糟糕了:

The absolute simplest way to do this is probably to use File.ReadAllLines, although that will create an array which will then be discarded - not great for memory usage, but probably not too bad:

HashSet<string> strings = new HashSet<string>(File.ReadAllLines("data.txt"));
...

if (strings.Contains(stringToCheck))
{
    ...
}

这篇关于匹配一个大的文本文件中的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆