什么是生成.NET 2一套独特的最快方法 [英] what is the fastest way to generate a unique set in .net 2

查看:123
本文介绍了什么是生成.NET 2一套独特的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有什么本质上是名值对的锯齿形阵列 - 我需要生成一组从这个唯一的名称值。交错数组约为86000×11的值。 不要紧,对我来说我必须要保存名称值对(一个字符串名称=值或专业类如KeyValuePair)什么样的方式。
其他信息:有40个不同的名称和不同值的数量较多 - 可能在该地区的10,000个值

我使用C#和.NET 2.0(以及表现这么差,我认为它可能是更好的把我的整个锯齿状排列到SQL数据库,并做一个SELECT DISTINCT从那里)。

下面是使用当前的code林:

 名单,其中,名单,其中,KeyValuePair<字符串,字符串>>> vehicleList = retriever.GetVehicles();
this.statsLabel.Text =唯一的车辆:+ vehicleList.Count;

字典< KeyValuePair<字符串,字符串>中INT> uniqueProperties =新字典< KeyValuePair<字符串,字符串>中INT>();
的foreach(列表< KeyValuePair<字符串,字符串>>车辆vehicleList)
{
    的foreach(KeyValuePair&所述;串,串>在车辆特性)
    {
        如果(!uniqueProperties.ContainsKey(属性))
        {
            uniqueProperties.Add(物业,0);
        }
    }
}
this.statsLabel.Text + =\与runique属性:+ uniqueProperties.Count;
 

解决方案

我有它在0.34秒运行下来9+分钟

现在的问题是比较KeyValuePair结构时。 我工作围绕它通过编写一个比较器对象,并通过它的一个实例的字典。

从我可以判断,KeyValuePair.GetHash code()返回了它的哈希值code的对象(在这个例子中,最独特的对象)。

由于词典增加(和检查存在)的每个项目,它采用了equals和GetHash code函数,但依靠的Equals功能时,散列code是少独一无二的。

通过提供一个更独特的GetHash code函数,它excerises的Equals远不如经常发挥作用。我还优化的Equals功能较少的独特之键之前比较更独特的价值。

86000 * 11项目10,000独特的性能在0.34秒运行使用下面的比较器对象(无需要9分钟比较器物体22秒为单位)

希望这有助于:)

 类StringPairComparer
        :的IEqualityComparer< KeyValuePair<字符串,字符串>>
    {
        公共布尔等于(KeyValuePair<字符串,字符串> X,KeyValuePair<字符串,字符串> Y)
        {
            返回x.Value == y.Value和放大器;&安培; x.Key == y.Key;
        }
        公众诠释GetHash code(KeyValuePair<字符串,字符串> OBJ)
        {
            返程(obj.Key + obj.Value).GetHash code();
        }
    }
 

修改:如果它只是一个字符串(而不是KeyValuePair,其中字符串=名称+值),这将是大约快一倍。这是一个不错intresting的问题,我已经花了的 faaaaaar太多时间就可以了的(我学会了安静的有点虽然)

I have what is essentially a jagged array of name value pairs - i need to generate a set of unique name values from this. the jagged array is approx 86,000 x 11 values. It does not matter to me what way I have to store a name value pair (a single string "name=value" or a specialised class for example KeyValuePair).
Additional Info: There are 40 distinct names and a larger number of distinct values - probably in the region 10,000 values.

I am using C# and .NET 2.0 (and the performance is so poor I am thinking that it may be better to push my entire jagged array into a sql database and do a select distinct from there).

Below is the current code Im using:

List<List<KeyValuePair<string,string>>> vehicleList = retriever.GetVehicles();
this.statsLabel.Text = "Unique Vehicles: " + vehicleList.Count;

Dictionary<KeyValuePair<string, string>, int> uniqueProperties = new Dictionary<KeyValuePair<string, string>, int>();
foreach (List<KeyValuePair<string, string>> vehicle in vehicleList)
{
    foreach (KeyValuePair<string, string> property in vehicle)
    {
        if (!uniqueProperties.ContainsKey(property))
        {
            uniqueProperties.Add(property, 0);
        }
    }
}
this.statsLabel.Text += "\rUnique Properties: " + uniqueProperties.Count;

解决方案

I have it running in 0.34 seconds down from 9+ minutes

The problem is when comparing the KeyValuePair structs. I worked around it by writing a comparer object, and passing an instance of it to the Dictionary.

From what I can determine, the KeyValuePair.GetHashCode() returns the hashcode of it's Key object (in this example the least unique object).

As the dictionary adds (and checks existence of) each item, it uses both Equals and GetHashCode functions, but has to rely on the Equals function when the hashcode is less unique.

By providing a more unique GetHashCode function, it excerises the Equals function far less often. I also optimised the Equals function to compare the more unique Values before the less unqiue Keys.

86,000 * 11 items with 10,000 unique properties runs in 0.34 seconds using the comparer object below (without the comparer object it takes 9 minutes 22 seconds)

Hope this helps :)

    class StringPairComparer
        : IEqualityComparer<KeyValuePair<string, string>>
    {
        public bool Equals(KeyValuePair<string, string> x, KeyValuePair<string, string> y)
        {
            return x.Value == y.Value && x.Key == y.Key;
        }
        public int GetHashCode(KeyValuePair<string, string> obj)
        {
            return (obj.Key + obj.Value).GetHashCode();
        }
    }

EDIT: If it was just one string (instead of a KeyValuePair, where string = Name+Value) it would be approx twice as fast. It's a nice intresting problem, and I have spent faaaaaar too much time on it (I learned quiet a bit though)

这篇关于什么是生成.NET 2一套独特的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆