System.Collections.Generic.Dictionary =极致的性能? [英] System.Collections.Generic.Dictionary = Ultimate performance?

查看:177
本文介绍了System.Collections.Generic.Dictionary =极致的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个haXe的C#的目标,我一直在学习的性能差异的haXe的的性病库,以便我们能够提供通过其跨平台代码的最佳性能成为可能。

I'm writing an haXe C# target, and I've been studying performance differences for haXe's std library so we can provide the best performance possible through its cross platform code.

一个很好的例子是哈​​希表的代码。我有点不情愿关于使用.NET的词典,因为它看起来笨重(结构的键/值对会占用巨大的,因为内存对齐问题的内存量,除了来自其持有不必要的信息),由于对性病图书馆有作为对象的哈希没有这样的事,我真的以为我可以不必调用GetHashCode的挤一点的表现,并沿内联这一切。

One very good example is for the hash table code. I was a little reluctant about using .NET's Dictionary, as it seems bulky (structs for key/value pairs can take up a huge amount of memory because of memory alignment issues, besides from unnecessary information held by it), and since on the std library there is no such thing as an object hash, I really thought I could squeeze a little performance by not having to call GetHashCode, and inline it all along.

此外,它的显然,字典实现使用链表来处理冲突,这是很不理想。

Also it's clear that the Dictionary implementation uses a linked list to deal with collisions, which is far from ideal.

因此,我们开始实现我们自己的解决方案,从IntHash(词典)
我们第一次实施跳房子散列,但它确实没有变成很好,但它是一种显然,它不会支持很好巨大的哈希表,因为H是一般机器字,而作为H /长度的增加,性能越差。

So we started to implement our own solution, starting with IntHash (Dictionary) We first implemented Hopscotch hashing, but it really didn't turn out very well, but it was kind of obvious that it wouldn't support very well huge hash tables, since H is normally a machine word, and as H / Length increases, the poorer the performance.

我们再跳下来实现哈什灵感的算法。这其中具有很高的潜在,其基准是令人印象深刻,它在同一阵列处理冲突。它也有一些伟大的事情,就像没有内存需要两次我们会调整。

We then jumped to implement a khash-inspired algorithm. This one had much potential, as its benchmarks are impressive, and it handles collisions on the same array. It had also some great things, like resizing without needing twice as memory as we would.

的基准线是令人失望的。当然,没有必要说,内存的使用是在我们的实现比字典的低得多。但我希望能得到一个不错的性能提升也是,但事实并非如此,很遗憾。这是不是太远低于 - 不到一个数量级 - 但对于包括set和get,.NET的实现仍表现较好。

The benchmarks were disappointing. Of course, there is no need to say that memory usage was much lower on our implementation than Dictionary's. But I was hoping to get a nice performance boost also, but that was not the case, unfortunately. It wasn't too far below - less than an order of magnitude - but for both sets and gets, .NET's implementation still performed better.

所以我的问题是:是,我们有最好的C#?我试图寻找任何定制的解决方案,而且似乎还有几乎没有。还有的是,C5泛型集合,但代码是如此混乱我甚至没有测试。而且我发现没有基准也。

So my question is: is that the best we have for C#? I tried looking for any custom solution, and it seems there is almost none. There is that C5 generic collection, but the code is so cluttered I did not even test. And I found no benchmark also.

所以...是不是这样?如果我只是环绕字典<?>

So... Is that it? Should I just wrap around Dictionary<>?

谢谢!

推荐答案

我发现,在.NET 词典表现良好,如果不是特别好,在大多数情况下。这是一个很好的通用实现。我最常遇到的问题是2 GB限制。在64位系统上,不能添加超过约8950万的项目更多的字典(当关键是整数或引用,并且该值是引用)。字典的开销似乎是每个项目24个字节。

I've found that the .NET Dictionary performs well, if not exceptionally well, in most situations. It's a good general purpose implementation. The problem I most often run into is the 2-gigabyte limit. On a 64-bit system, you can't add more than about 89.5 million items to a dictionary (when the key is an integer or a reference, and the value is a reference). Dictionary overhead appears to be 24 bytes per item.

这限制使自己在一种非常奇怪的方式闻名。在词典似乎加倍成长 - 已满时,它会增加容量到下一个素数这是至少两倍电流的大小。正因为如此,词典将增长到约47万元,然后抛出一个异常,因为当它试图增加一倍(9400万),内存分配失败(由于2 GB的限制)。我避开这个问题通过预先分配词典(即调用允许您指定容量的构造函数)。这也加速了填充字典,因为它永远不会有增长,这需要分配一个新的阵列和重新散列的一切。

That limit makes itself known in a very odd way. The Dictionary seems to grow by doubling--when it gets full, it increases capacity to the next prime number that's at least double the current size. Because of that, the dictionary will grow to about 47 million and then throw an exception because when it tries to double (to 94 million), the memory allocation fails (due to the 2 gigabyte limit). I get around the problem by pre-allocating the Dictionary (i.e. call the constructor that lets you specify the capacity). That also speeds up populating the dictionary because it never has to grow, which entails allocating a new array and re-hashing everything.

是什么让你说字典用来冲突解决链表?我敢肯定它采用开放的解决,但我不知道它是怎么做的探头。我想,如果它确实线性探测,则效果类似于你用链表得到什么。

What makes you say that Dictionary uses a linked list for collision resolution? I'm pretty sure it uses open addressing, but I don't know how it does the probes. I guess if it does linear probing, then the effect is similar to what you'd get with a linked list.

我们写我们自己的 BigDictionary 类来让过去的2-GB的限制,发现与线性探测一个​​简单开放的寻址方案相当不错的表现。它的速度比不上词典,但它可以处理数以百万计的项目(10亿,如果我有记忆)。

We wrote our own BigDictionary class to get past the 2-gigabyte limit and found that a straightforward open addressing scheme with linear probing gives reasonably good performance. It's not as fast as Dictionary, but it can handle hundreds of millions of items (billions if I had the memory).

这就是说,你的 的应该能够编写优于.NET字典在某些情况下更快的特定任务的哈希表。但是,对于一个通用哈希表我想你会很难做的比什么BCL提供了更好的。

That said, you should be able to write a faster task-specific hash table that outperforms the .NET Dictionary in some situations. But for a general purpose hash table I think you'll be hard pressed to do better than what the BCL provides.

这篇关于System.Collections.Generic.Dictionary =极致的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆