在C#为什么会更快地从列表创建的HashSet,而不是使用HashSet的开始? [英] In C# why is it faster to Create a HashSet from a List, instead of starting with a HashSet?

查看:260
本文介绍了在C#为什么会更快地从列表创建的HashSet,而不是使用HashSet的开始?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须接受一个上限,并返回质数号码列表到该限制的方法。

 公开静态列表< INT> AllPrimesUnder(INT UPPERLIMIT)



后来我决定,我真的只是需要做的名单上查找,经常只是问这个问题:这是总理。由于我与所有的素数在处理价值像一百万,我意识到,HashSet的是我应该使用的结构。当然使用该方法的结果的查找增快,但其自身是



方法我相信这是原因慢是因为HashSet的检查重复添加,而只是列出塞到在年底之前。让我吃惊,什么催生的问题和标题,就是用List启动和使用它来创建HashSet的,就像这样:

  HashSet的=新的HashSet< INT>(Prime.AllPrimesUnder(1000000)); 



比使用的Hashset内部的方法,使这样的呼叫更快



  HashSet的= Prime.AllPrimesUnder_Hash(1000000); 

如果减速是在重复检查时,它应该做的检查不管相同量的什么,对不对?这可能是在那里我的理解是失败了我。



下面是我在百万之下得到素数的时间。




  • 0.1136s纯哈希

  • 0.0975s纯列表(预期要快的)

  • 0.0998s纯列表转换为哈希值(预计不会的)



如果这样做的原因可以简单来解释,我很乐意听到这种说法。我想,至少我正在寻找的是足够的了解,知道我是否应该与列表或HashSet的开始,如果最终的结果将是项目的大HashSet的。



我在两者之间添加下面的主要方法的主体,但要注意与数据结构中的所有交互是相同的(代码明智)。我不相信我是如何将数据添加到结构应该影响异常。

 公共静态列表< INT> AllPrimesUnder(INT UPPERLIMIT)
{
名单,LT; INT> primeList =新的List< INT>();
primeList.Add(2);
INT testNumber = 3;
布尔isPrime;

,而(testNumber< = UPPERLIMIT)
{
isPrime = TRUE;

的foreach(在primeList INT素数)
{
如果(testNumber%黄金== 0)
{
isPrime = FALSE;
中断;
}
如果(testNumber<素*素数)
中断;
}

如果(isPrime)
primeList.Add(testNumber);

testNumber ++;
}

返回primeList;
}



编辑:按要求我加入了代号为哈希方法。如果它看起来几乎相同,这是因为它是



 公共静态的HashSet< INT> AllPrimesUnder_Hash(INT UPPERLIMIT)
{
的HashSet< INT> primeHash =新的HashSet<&诠释GT;();
primeHash.Add(2);
INT testNumber = 3;
布尔isPrime;

,而(testNumber< = UPPERLIMIT)
{
isPrime = TRUE;

的foreach(在primeHash INT素数)
{
如果(testNumber%黄金== 0)
{
isPrime = FALSE;
中断;
}
如果(testNumber<素*素数)
中断;
}

如果(isPrime)
primeHash.Add(testNumber);

testNumber ++;
}

返回primeList;
}

同时按要求是我以前的(丑陋的hackish)的代码测试执行时间:

 秒表=新的秒表(); 
INT迭代= 1;
的HashSet< INT> HashSet的=新的HashSet<&诠释GT;();
名单,LT; INT>名单=新名单,LT; INT>();

stopWatch.Restart();
的for(int i = 0; I<迭代;我++)
{
HashSet的= Prime.AllPrimesUnder_Hash(1000000);
}
stopWatch.Stop();

Console.WriteLine(哈希:+(stopWatch.Elapsed.TotalSeconds /迭代)的ToString(################### #));



///////////////////// /////

  stopWatch.Restart(); 
的for(int i = 0; I<迭代;我++)
{
HashSet的=新的HashSet< INT>(Prime.AllPrimesUnder(1000000));
}
stopWatch.Stop();


Console.WriteLine(列表转换:+(stopWatch.Elapsed.TotalSeconds /迭代)的ToString(##############。 ######));


解决方案

AllPrimesUnder 您是(每个总理候选人一次)列举的主要名单多次。枚举列表比枚举 HashSet的因内部数组的 HashSet的较为稀疏。



没有看到为 AllPrimesUnder_Hash 我的代码的,这是主要的原因。



我不相信调整几千项的列表可能会消耗20毫秒。复制使用的memcpy 内存(这是内部发生)是最高的吞吐量的操作,你可以做一个。您可以复制几十每核心秒千兆字节。


I have a method that takes an upper limit, and returns a list of primes numbers up to that limit.

    public static List<int> AllPrimesUnder(int upperLimit)

I later decided that I really just needed to do lookups on the list, often just asking the question "Is This Prime". Since I was dealing with all primes under values like a million, I realized that HashSet was the structure I should be using. Certainly the lookup using the result of the method was faster, but the method its self was slower.

I believe the reason it's slower is because HashSet checks for duplicates before adding, while a List just shoves it on the end. What surprised me, and what spawned the question and title, is why starting with a List and using it to create HashSet, like so:

    hashSet = new HashSet<int>(Prime.AllPrimesUnder(1000000));

is faster than using a Hashset internal to the method, enabling a call like this:

    hashSet = Prime.AllPrimesUnder_Hash(1000000);

If the slowdown is in the duplicate checking, it should have to do the same amount of checking no matter what, right? This is likely where my understanding is failing me.

Here are the times I'm getting for primes under one million.

  • 0.1136s Pure Hash
  • 0.0975s Pure List (expected to be faster)
  • 0.0998s Pure List Converted to Hash (not expected)

If the reason for this can be explained in simple terms, I'd love to hear it. I suppose at a minimum what I'm looking for is enough of an understanding to know if I should start with a List or a HashSet if the end result will be a large HashSet of items.

I've added the body of the prime method below, but note that all interaction with the data structure is identical (code wise) between the two. I don't believe how I add data to the structure should effect the anomaly.

    public static List<int> AllPrimesUnder(int upperLimit)
    {
        List<int> primeList = new List<int>();
        primeList.Add(2);
        int testNumber = 3;
        bool isPrime;

        while (testNumber <= upperLimit)
        {
            isPrime = true;

            foreach (int prime in primeList)
            {
                if (testNumber % prime == 0)
                {
                    isPrime = false;
                    break;
                }
                if (testNumber < prime*prime)
                    break;
            }

            if (isPrime)
                primeList.Add(testNumber);

            testNumber++;
        }

        return primeList;
    }

Edit: By request I am adding the code for the Hash method. If it looks nearly identical that's because it is.

public static HashSet<int> AllPrimesUnder_Hash(int upperLimit)
{
    HashSet<int> primeHash = new HashSet<int>();
    primeHash.Add(2);
    int testNumber = 3;
    bool isPrime;

    while (testNumber <= upperLimit)
    {
        isPrime = true;

        foreach (int prime in primeHash)
        {
            if (testNumber % prime == 0)
            {
                isPrime = false;
                break;
            }
            if (testNumber < prime*prime)
                break;
        }

        if (isPrime)
            primeHash.Add(testNumber);

        testNumber++;
    }

    return primeList;
}

Also by request is the (ugly hackish) code I used to test the execution time:

        Stopwatch stopWatch = new Stopwatch();
        int iterations = 1;
        HashSet<int> hashSet = new HashSet<int>();
        List<int> list = new List<int>();

        stopWatch.Restart();
        for (int i = 0; i < iterations; i++)
        {
            hashSet = Prime.AllPrimesUnder_Hash(1000000);
        }
        stopWatch.Stop();

        Console.WriteLine("Hash: " + (stopWatch.Elapsed.TotalSeconds / iterations).ToString("#.###################"));

//////////////////////////

        stopWatch.Restart();
        for (int i = 0; i < iterations; i++)
        {
            hashSet = new HashSet<int>(Prime.AllPrimesUnder(1000000));
        }
        stopWatch.Stop();


        Console.WriteLine("List converted: " + (stopWatch.Elapsed.TotalSeconds / iterations).ToString("#.###################"));

解决方案

In AllPrimesUnder you are enumerating the prime list many times (once for each prime candidate). Enumerating a List is faster than enumerating a HashSet because the internal array of the HashSet is more sparse.

Not seeing the code for AllPrimesUnder_Hash I guess that this is the main cause.

I'm not convinced that resizing a list of a few thousand items could consume 20ms. Copying memory using memcpy (which is what happens internally) is one of the highest-throughput operations you can do. You can copy tens of gigabytes per second per core.

这篇关于在C#为什么会更快地从列表创建的HashSet,而不是使用HashSet的开始?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆