对于大集合LINQ性能 [英] LINQ Performance for Large Collections

查看:147
本文介绍了对于大集合LINQ性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大集合的字符串(可达1M)按字母顺序排序。我已经尝试使用的HashSet,SortedDictionary和词典对这个集合LINQ查询。我是静态缓存集合,它是由50MB的大小,而我总是打电话对缓存的集合的LINQ查询。我的问题如下:

I have a large collection of strings (up to 1M) alphabetically sorted. I have experimented with LINQ queries against this collection using HashSet, SortedDictionary, and Dictionary. I am static caching the collection, it's up to 50MB in size, and I'm always calling the LINQ query against the cached collection. My problem is as follows:

无论集合类型,性能穷得多比SQL(多达200毫秒)。在做对基础SQL表的类似查询,性能快得多(5-10ms)。我已经实现了我的LINQ查询,如下所示:

Regardless of collection type, performance is much poorer than SQL (up to 200ms). When doing a similar query against the underlying SQL tables, performance is much quicker ( 5-10ms). I have implemented my LINQ queries as follows:

public static string ReturnSomething(string query, int limit)
{
  StringBuilder sb = new StringBuilder();
  foreach (var stringitem in MyCollection.Where(
      x => x.StartsWith(query) && x.Length > q.Length).Take(limit))
  {
      sb.Append(stringitem);
  }

  return sb.ToString();
}

这是我的理解是,HashSet的,字典等,实现用二叉树的搜索,而不是标准枚举查找。我有哪些选择高性能LINQ查询到高级的集合类型?

It is my understanding that the HashSet, Dictionary, etc. implement lookups using binary tree search instead of the standard enumeration. What are my options for high performance LINQ queries into the advanced collection types?

推荐答案

在你目前的code你不使用任何的特殊功能词典 / SortedDictionary / 的HashSet 的集合,您使用的是他们,你会使用<$ C $以同样的方式C>列表。这就是为什么你没有看到任何性能上的差异。

In your current code you don't make use of any of the special features of the Dictionary / SortedDictionary / HashSet collections, you are using them the same way that you would use a List. That is why you don't see any difference in performance.

如果您使用字典作为指标,其中字符串的前几个字符是关键,字符串列表是价值,可以从搜索字符串挑选出字符串的整个集合,有可能一小部分匹配。

If you use a dictionary as index where the first few characters of the string is the key and a list of strings is the value, you can from the search string pick out a small part of the entire collection of strings that has possible matches.

我写了下面的类来进行测试。如果我填充它一百万字符串和搜索有八个字符的字符串,它撕裂通过一切可能的匹配在约3毫秒。用一个字符序列检索是最坏的情况,但它找到的第1000个匹配在约4毫秒。找到所有匹配一个字符串大约需要25毫秒。

I wrote the class below to test this. If I populate it with a million strings and search with an eight character string it rips through all possible matches in about 3 ms. Searching with a one character string is the worst case, but it finds the first 1000 matches in about 4 ms. Finding all matches for a one character strings takes about 25 ms.

类为1,2,4和8字符键创建索引。如果你看一下你的具体数据和你搜索,你应该能够选择哪些索引创建,优化为您的条件。

The class creates indexes for 1, 2, 4 and 8 character keys. If you look at your specific data and what you search for, you should be able to select what indexes to create to optimise it for your conditions.

public class IndexedList {

    private class Index : Dictionary<string, List<string>> {

        private int _indexLength;

        public Index(int indexLength) {
            _indexLength = indexLength;
        }

        public void Add(string value) {
            if (value.Length >= _indexLength) {
                string key = value.Substring(0, _indexLength);
                List<string> list;
                if (!this.TryGetValue(key, out list)) {
                    Add(key, list = new List<string>());
                }
                list.Add(value);
            }
        }

        public IEnumerable<string> Find(string query, int limit) {
            return
                this[query.Substring(0, _indexLength)]
                .Where(s => s.Length > query.Length && s.StartsWith(query))
                .Take(limit);
        }

    }

    private Index _index1;
    private Index _index2;
    private Index _index4;
    private Index _index8;

    public IndexedList(IEnumerable<string> values) {
        _index1 = new Index(1);
        _index2 = new Index(2);
        _index4 = new Index(4);
        _index8 = new Index(8);
        foreach (string value in values) {
            _index1.Add(value);
            _index2.Add(value);
            _index4.Add(value);
            _index8.Add(value);
        }
    }

    public IEnumerable<string> Find(string query, int limit) {
        if (query.Length >= 8) return _index8.Find(query, limit);
        if (query.Length >= 4) return _index4.Find(query,limit);
        if (query.Length >= 2) return _index2.Find(query,limit);
        return _index1.Find(query, limit);
    }

}

这篇关于对于大集合LINQ性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆