并行树的遍历在C# [英] Parallel tree traversal in C#

查看:198
本文介绍了并行树的遍历在C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要快速遍历一棵树,我愿做并行。我宁愿使用并行扩展比手动旋转起来了一堆线程

I need to traverse a tree quickly, and I would like to do it in parallel. I'd rather use the parallel extensions than manually spin up a bunch of threads.

我当前的代码看起来是这样的:

My current code looks something like this:

   public void Traverse(Node root)
    {
        var nodeQueue = new Queue<Node>();
        nodeQueue.Enqueue(root);
        while (nodeQueue.Count!=0)
        {
            var node = nodeQueue.Dequeue();
            if (node.Property = someValue) DoSomething(node);
            foreach (var node in node.Children)
            {
                nodeQueue.Enqueue(node);
            }
        }
    }



我真的很希望并行.ForEach了Parallel.While模拟。我碰到斯蒂芬Toub的文章来到实现并行虽然与Parallel.ForEach 。如果正确地读取它,这仍然是行不通的,因为我是变异我试图遍历队列中。

I was really hoping that Parallel.ForEach had a Parallel.While analog. I came across Stephen Toub's article on Implementing Parallel While with Parallel.ForEach. If read it correctly this still won't work because I am mutating the queue I am trying to iterate.

我是否需要使用任务工厂和递归(和那危险?)? ?还是有我俯瞰一些简单的解决方案。

Do I need to use a task factory and recursion (and is that risky?) ? or is there some simple solution I am overlooking?

编辑:
@svick

@svick

树刚刚超过25万节点。
的最大深度,现在是14个节点,包括深根。

The tree has just over 250,000 nodes. The maximum depth right now is 14 nodes deep including the root.

有关闭根约500个节点,之后的平衡有一个相当随机的分布。
我会很快得到分发一些更好的统计

There are about 500 nodes off the root, and the balance after that has a fairly random distribution. I'll get some better stats on the distribution soon.

@Enigmativity:

@Enigmativity:

是的,树正在被修改,同时受到众多用户,但我通常将有树或子树一个共享读锁或允许脏读。

Yes, the tree is being modified, concurrently by many users, but I will usually have a shared read lock for the tree or sub tree, or allow for dirty reads.

通话要node.Children可以考虑原子。

Calls to node.Children can be considered atomic.

DoSomething的是一些代表的真正之一,对于一些昂贵的操作,我可能会聚集节点的快照列表,并处理它们的穿越之外。

DoSomething is really one of several delegates, for some expensive operations I will probably gather a snapshot list of nodes and process them outside the traversal.

我意识到,我也许应该看看一般情况下(子树被遍历而不是整个树)。为此我遍历跑到树的每个节点上,看着总时间。

I realized that I should probably look at the general case (a sub-tree being traversed instead of the entire tree.) To that end I ran traverse on every node of the tree and looked at the total time.

我用了一个Parallel.ForEach(节点,遍历)每个遍历算法,其中节点包含了所有〜250K节点。这个模拟(排序)很多用户同时请求了很多不同的节点。

I used a Parallel.ForEach(nodes, Traverse) for each traversal algorithm, where nodes contained all ~250k nodes. This simulated (sort of) a lot of users simultaneously requesting a lot of different nodes.

00256ms广度优先顺序

00256ms Breadth First Sequential

00323ms广度优先顺序的工作(我加一个静态计数器作为工作 )

00323ms Breadth First Sequential with work (i incremented a static counter as "work")

01495ms Kirks第一个答案

01495ms Kirks First answer

01143ms Svicks第二个答案

01143ms Svicks Second answer

00000ms递归单线程60后未完成

00000ms Recursive Single Threaded didn't finish after 60s

00000ms Enigmativity的回答60

00000ms Enigmativity's answer didn't finish after 60s

@Enigma,我认为这是可能的我可能会莫名其妙地搞砸了你的alogrithm,因为它看起来像它应该是更快。

@Enigma, I think it's possible I might have messed up your alogrithm somehow, because it seems like it should be much quicker.

结果让我吃惊的说至少。
我不得不添加一些工作广度优先顺序只是为了说服自己,编译器并没有奇迹般地优化掉的遍历。

The results surprised me to say the least. I had to add some work to the breadth first sequential just to convince myself that the compiler wasn't magically optimizing away the traversals.

有关单遍历头部的,并行的第一级只有最佳性能。但只是勉强,这个数字提高了,因为我加入到第二级以上节点(2000年,而不是500)。

For the single traversal of the head, parallelizing the first level only had the best performance. But just barely, this number improved as I added more nodes to the second level (2000 instead of 500).

推荐答案

最直接的方法是创建一个工作每个子节点,然后等待它们全部:

The most direct way would be to create a Task for each child node and then wait for all of them:

public void Traverse(Node root)
{
    if (node.Property == someValue)
        DoSomething(node);

    var tasks = new List<Task>();

    foreach (var node in node.Children)
    {
        // tmp is necessary because of the way closures close over loop variables
        var tmp = node;
        tasks.Add(Task.Factory.StartNew(() => Traverse(tmp)));
    }

    Task.WaitAll(tasks.ToArray());
}



任务是相当重量轻,所以创建很多很多的效果相当好。但是,他们确实有一些开销,所以做一些复杂的东西具有共享队列可能将是更快一些任务。如果这是你要去走的路,不要忘了,空队列并不意味着所有的工作已经完成。从 System.Collections.Concurrent 命名空间的类是要来得心应手,如果你去这条路。

Task is fairly light-weight, so creating lots of them works reasonably well. But they do have some overhead, so doing something more complicated like having a few tasks that share a queue is probably going to be faster. If that's the way you're going to go, don't forget that empty queue doesn't mean all work is done. Classes from the System.Collections.Concurrent namespace are going to come handy if you went this way.

编辑:由于树的形状(根大约有500名儿童),处理只是在平行的第一级应给予良好的表现:

Because of the shape of the tree (the root has about 500 children), processing just the first level in parallel should give good performance:

public void Traverse(Node root, bool parallel = true)
{
    if (node.Property == someValue)
        DoSomething(node);

    if (parallel)
    {
        Parallel.ForEach(node.Children, node =>
        {
            Traverse(node, false);
        });
    }
    else
    {
        foreach (var node in node.Children)
        {
            Traverse(node, false);
        }
    }
}

这篇关于并行树的遍历在C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆