高性能"包含"在C#中的字符串列表搜索 [英] High performance "contains" search in list of strings in C#

查看:158
本文介绍了高性能"包含"在C#中的字符串列表搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约清单。 500000串,每个约。 100个字符长。给定一个搜索词,我想找出包含搜索词列表中的所有字符串。目前我正与一个普通的老数据集使用Select方法(MATCH%长期%)这样做。这发生在我的笔记本电脑大约600毫秒。我想,使其更快,也许100-200ms。

I have a list of approx. 500,000 strings, each approx. 100 characters long. Given a search term, I want to identify all strings in the list that contain the search term. At the moment I am doing this with a plain old dataset using the Select method ("MATCH %term%"). This takes about 600ms on my laptop. I'd like to make it faster, maybe 100-200ms.

这将是一个建议的方法?

What would be a recommended approach?

性能是至关重要的,所以我可以在必要时进行交易的内存占用为更好的性能(在合理范围内)。字符串列表将不会改变,一旦初始化,因此在计算哈希值也将是一个选项。

Performance is critical so I can trade memory footprint for better performance if necessary (within reason). The list of strings will not change once initialised so calculating hashes would also be an option.

有没有人有一个建议,和C#的数据结构是最适合的任务吗?

Does anyone have a recommendation and which C# data structures are best suited to the task?

推荐答案

我听说过好东西http://incubator.apache.org/lucene.net /\">Lucene.NET 当谈到执行快速全文搜索。他们所做的工作,找出最快的数据结构和这样的使用。 。我建议给这个一杆

I've heard good things about Lucene.NET when it comes to performing quick full-text searches. They've done the work to figure out the fastest data structures and such to use. I'd suggest giving that a shot.

否则,你可能只是尝试这样的:

Otherwise, you might just try something like this:

var matches = list.AsParallel().Where(s => s.Contains(searchTerm)).ToList();



但它可能不会让你失望到100毫秒。

But it probably won't get you down to 100ms.

这篇关于高性能"包含"在C#中的字符串列表搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆