高性能"包含"在C#中的字符串列表搜索 [英] High performance "contains" search in list of strings in C#
问题描述
我有一个大约清单。 500000串,每个约。 100个字符长。给定一个搜索词,我想找出包含搜索词列表中的所有字符串。目前我正与一个普通的老数据集使用Select方法(MATCH%长期%)这样做。这发生在我的笔记本电脑大约600毫秒。我想,使其更快,也许100-200ms。
I have a list of approx. 500,000 strings, each approx. 100 characters long. Given a search term, I want to identify all strings in the list that contain the search term. At the moment I am doing this with a plain old dataset using the Select method ("MATCH %term%"). This takes about 600ms on my laptop. I'd like to make it faster, maybe 100-200ms.
这将是一个建议的方法?
What would be a recommended approach?
性能是至关重要的,所以我可以在必要时进行交易的内存占用为更好的性能(在合理范围内)。字符串列表将不会改变,一旦初始化,因此在计算哈希值也将是一个选项。
Performance is critical so I can trade memory footprint for better performance if necessary (within reason). The list of strings will not change once initialised so calculating hashes would also be an option.
有没有人有一个建议,和C#的数据结构是最适合的任务吗?
Does anyone have a recommendation and which C# data structures are best suited to the task?
推荐答案
我听说过好东西http://incubator.apache.org/lucene.net /\">Lucene.NET 当谈到执行快速全文搜索。他们所做的工作,找出最快的数据结构和这样的使用。 。我建议给这个一杆
I've heard good things about Lucene.NET when it comes to performing quick full-text searches. They've done the work to figure out the fastest data structures and such to use. I'd suggest giving that a shot.
否则,你可能只是尝试这样的:
Otherwise, you might just try something like this:
var matches = list.AsParallel().Where(s => s.Contains(searchTerm)).ToList();
但它可能不会让你失望到100毫秒。
But it probably won't get you down to 100ms.
这篇关于高性能"包含"在C#中的字符串列表搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!