遍历树时,使用线程 [英] Use threads when traversing a tree

查看:96
本文介绍了遍历树时,使用线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会想加速遍历树的过程。这里是一个节点的例子:

 类节点
{
公开名单<节点>儿童{搞定;组; }
公众诠释SompeProperty {搞定;组; }
公共字符串SomeOtherProperty {搞定;组; }
}



我穿越尝试的方法是这样的:

 静态无效TraverseTree(节点ParentNode)
{
如果(ParentNode.Children == NULL)
的回报;

的foreach(VAR孩子ParentNode.Children)
{
TraverseTree(小孩);
}
}



ParentNode.Children 方法需要大约1毫秒,因为一个节点代表一个文件或目录。我只是用一个节点的这个例子,以更好地说明我的观点。



所以,如果你想想看,如果第一个节点有4个孩子,其中每个孩子都有千万后代,我们可以增加这个穿越的速度,如果我们在哪里可以遍历每个这些4个孩子在separeate挑线并行编程的优势。如果本来场景话,我会采取这种方法。 ?但是,如果我不知道提前一棵树我怎么能做到这一点的结构



我一直在思考:



1)开始遍历树的地方具有孩子堆栈上的前10个节点,然后开始每一个单独的线程的遍历



2)做这样的事情:

 静态无效TraverseTree(节点ParentNode)
{
如果(ParentNode.Children == NULL)
的回报;

的foreach(VAR孩子ParentNode.Children)
{
ThreadPool.QueueUserWorkItem(新WaitCallback((X)=>
{
TraverseTree(子);
}),NULL);
}
}

这常常让我奇怪的结果,但它显著更快。






结果



使用任务提高了算法的速度约40%在这里是结果:



扫描我的整个C:\的车程花了约 5.81 秒以下算法:

  //目录路径=C:\
变种现在= DateTime.Now;

任务<名单,LT; ScanItem>> T1 =新任务<名单,LT; ScanItem>>(()=>
{
返回GetAllFilesInDirectory(目录);
});

t1.Start();

t1.Wait();

VAR完成= DateTime.Now,现在; //做= 5.81平均



扫描我的整个C:\的车程花了约 3.01 秒以下算法:

  //目录路径=C:\
变种现在= DateTime.Now;


//得到我的C的所有目录:驾驶它应该只包含目录
var目录= Directory.GetDirectories(目录);

//目录= 17目录:的Inetpub,MSOCache文件,PrefLogs,PROGRAMFILES,PROGRAMFILES(X86)等...

任务<名单< ScanItem>> [] myTasks =新任务<名单,LT; ScanItem>> directories.Length]

//创建前在c每个目录一个任务:\驱动器
的for(int k = 0; K< myTasks.Length; k ++)
{
变种currentDir =目录[K];
myTasks [K] =新任务<名单< ScanItem>>(()=>
{
返回GetAllFilesInDirectory(currentDir);
});
}

//启动所有任务
(INT K = 0; K< myTasks.Length; k ++)
myTasks [K]。开始( );


Task.WaitAll(myTasks); //等待所有任务完成

VAR完成=现在 - DateTime.Now; //平均约3.01秒

如果我在哪里穿过第一算法返回318222的文件和目录列表(这是正确的号码)。第二个算法返回318195这是非常接近我不明白为什么,但...



我在有8个内核的计算机测试此。也许,如果我在这里到了使用一个任务可能比创造这一切的任务,17快2个内核的计算机上运行此。



如果你想知道我用什么算法来获取文件快速再看看的 http://stackoverflow.com/a/724184/637142


解决方案

使用任务并行库,而不是滚动您自己的并行代码。它非常适合解决这类问题。



TPL的工作方式是,而不是你一个问题分配线程,您只需分手的问题为任务,让TPL采取搞清楚如何并行的工作当中可用的工人池的照顾。只要为树的每个子分支任务;这些任务可以反过来催生关闭的及其的支行自己的任务。第三方物流将分配线程出池,直到处理器是饱和的。



由于此,它让TPL知道你的任务是否要被选通是非常重要的CPU或I / O上:




  • 如果任务是CPU密集型的那么TPL将分配每一汇集线程CPU和使其他任务等待,直到有可用的芯;最大化吞吐量和饱和所有的处理器。这正是你想要什么:如果你买了一台具有四个处理器,其中两个是空闲的,那么你付出为你不使用两芯


  • <。 p>如果一个的的任务是I / O绑定,那么你可以在创建任务时使用 LongRunning 选项以指示TPL此任务不应该消耗的整个核心;其他任务应给予在该核心转机。


  • 如果,因为它似乎是这样,你的许多的I / O密集型任务,那么你应该考虑使用的 TaskCompletionSource 的代替,作为能够更高效地利用继续回调。还可以考虑使用新的异步/的await 的C#5安排延续功能;它提供编写异步代码的一个更愉快的方式。




当然,不要忘记,如果问题实际上是饱和的机器的I / O能力,那么没有的处理的平行度将会使凹痕量。如果你填写一个游泳池,增加更多的软管相同的水龙头不增加通过水龙头的流量。


I will like to speed the process of traversing a tree. Here is an example of a node:

    class Node
    {
        public List<Node> Children { get; set; }
        public int SompeProperty { get; set; }
        public String SomeOtherProperty { get; set; }
    }

the way I traverse the try is like:

    static void TraverseTree(Node ParentNode)
    {
        if (ParentNode.Children == null)
            return;

        foreach (var child in ParentNode.Children)
        {
            TraverseTree(child);               
        }
    }

the ParentNode.Children method takes about 1 millisecond because a Node represents a file or a directory. I just used this example of a node to illustrate better my point.

so if you think about it if the first node has 4 children and each of those children has 10000000 descendants we could increase the speed of this traversal if we where to traverse each of those 4 children in a separeate thread taking advantage of parallel programming. if that would have been the scenario then I would have taken that approach. But if I don't know in advance the structure of a tree how could I do it?

I been thinking about:

1) start traversing the tree place the first 10 nodes that has children on a stack then start the traversal of each on a separate thread.

2) Do something like:

    static void TraverseTree(Node ParentNode)
    {
        if (ParentNode.Children == null)
            return;

        foreach (var child in ParentNode.Children)
        {
            ThreadPool.QueueUserWorkItem(new WaitCallback((x) =>
            {                    
                TraverseTree(child);   
            }), null);                            
        }
    }

this often gives me strange results but it is significantly faster.


Results

Using task improved the speed of the algorithm by about 40% here are the results:

scanning my whole C:\ drive took about 5.81 seconds with the following algorithm:

        //directoryPath  = "C:\"
    var now = DateTime.Now;

        Task<List<ScanItem>> t1 = new Task<List<ScanItem>>(() =>
        {
            return GetAllFilesInDirectory(directoryPath);
        });

        t1.Start();

        t1.Wait();

        var done = DateTime.Now-now;  // done = 5.81 average

scanning my whole C:\ drive took about 3.01 seconds with the following algorithm:

        //directoryPath  = "C:\"  
        var now = DateTime.Now;


        // get all directories in my c: drive it should only contain directories
        var directories = Directory.GetDirectories(directoryPath);

        // directories = 17 directories:  inetpub, MSOCache, PrefLogs, ProgramFiles, ProgramFiles (x86) etc...

        Task<List<ScanItem>>[] myTasks = new Task<List<ScanItem>>[directories.Length];

        // create a task fore each directory in the c:\ drive
        for (int k = 0; k < myTasks.Length; k++)
        {
            var currentDir = directories[k];
            myTasks[k] = new Task<List<ScanItem>>(() =>
            {
                return GetAllFilesInDirectory(currentDir);
            });                
        }

        // start all the tasks
        for (int k = 0; k < myTasks.Length; k++)
            myTasks[k].Start();


        Task.WaitAll(myTasks); // wait for all tasks to finish

        var done = now - DateTime.Now;  // average about 3.01 seconds

If I where to traverse the list the first algorithm returns 318,222 files and directories (that is the correct number). the second algorithm returns 318,195 which is very close I don't understand why though...

I am testing this in a computer that has 8 cores. Maybe if I where to run this on a computer that had 2 cores using one task could be faster than creating all this 17 tasks.

if you want to know what algorithm I use to get the files that fast then look at http://stackoverflow.com/a/724184/637142

解决方案

Use the Task Parallel Library, rather than rolling your own parallelism code. It is ideally suited to solve this sort of problem.

The way the TPL works is rather than you assigning threads to a problem, you simply break up the problem into "tasks" and let the TPL take care of figuring out how to parallelize the work amongst a pool of available workers. Just make a task for each sub-branch of the tree; those tasks can in turn spawn off tasks of their own for their sub-branches. The TPL will assign threads out of a pool until the processors are saturated.

Because of this, it is important to let the TPL know whether your tasks are going to be gated on the CPU or the I/O:

  • If the tasks are CPU-bound then the TPL will assign one pooled thread per CPU and make the other tasks wait until there is a core available; that maximizes throughput and saturates all the processors. That is exactly what you want: if you bought a machine with four processors and two of them are idle then you paid for two cores that you're not using.

  • If a single task is I/O bound then you can use the LongRunning option when creating the task to indicate to the TPL that this task should not consume an entire core; other tasks should be given a turn at that core.

  • If, as it seems is the case, you have many I/O bound tasks then you should consider using a TaskCompletionSource instead, as that allows for more efficient use of "continuation" callbacks. Consider also using the new async/await feature of C# 5 to schedule continuations; it affords a far more pleasant way of writing the asynchronous code.

And of course, do not forget that if the problem actually is saturating the I/O capability of the machine then no amount of processor parallelism is going to make a dent. If you're filling a swimming pool, adding more hoses to the same faucet doesn't increase the flow through that faucet.

这篇关于遍历树时,使用线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆