高性能“包含"在 C# 中的字符串列表中搜索 [英] High performance "contains" search in list of strings in C#

查看:19
本文介绍了高性能“包含"在 C# 中的字符串列表中搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约的列表.500,000 个字符串,每个约100 个字符长.给定一个搜索词,我想确定列表中包含搜索词的所有字符串.目前,我正在使用 Select 方法(MATCH %term%")对一个普通的旧数据集执行此操作.这在我的笔记本电脑上大约需要 600 毫秒.我想让它更快,也许 100-200 毫秒.

I have a list of approx. 500,000 strings, each approx. 100 characters long. Given a search term, I want to identify all strings in the list that contain the search term. At the moment I am doing this with a plain old dataset using the Select method ("MATCH %term%"). This takes about 600ms on my laptop. I'd like to make it faster, maybe 100-200ms.

推荐的方法是什么?

性能至关重要,因此我可以在必要时(在合理范围内)用内存占用来换取更好的性能.字符串列表一旦初始化就不会改变,因此计算哈希也是一种选择.

Performance is critical so I can trade memory footprint for better performance if necessary (within reason). The list of strings will not change once initialised so calculating hashes would also be an option.

有没有人有推荐,哪些 C# 数据结构最适合该任务?

Does anyone have a recommendation and which C# data structures are best suited to the task?

推荐答案

我听说了关于 的好消息Lucene.NET 在执行快速全文搜索方面.他们已经完成了找出最快的数据结构等工作.我建议试一试.

I've heard good things about Lucene.NET when it comes to performing quick full-text searches. They've done the work to figure out the fastest data structures and such to use. I'd suggest giving that a shot.

否则,您可以尝试这样的操作:

Otherwise, you might just try something like this:

var matches = list.AsParallel().Where(s => s.Contains(searchTerm)).ToList();

但它可能不会让你缩短到 100 毫秒.

But it probably won't get you down to 100ms.

这篇关于高性能“包含"在 C# 中的字符串列表中搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆