分布式tensorflow参数服务器和工作器 [英] Distributed tensorflow parameter server and workers

查看:326
本文介绍了分布式tensorflow参数服务器和工作器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在密切关注Imagenet分布式TF火车示例。

I was closely following the Imagenet distributed TF train example.

当在两个不同的示例上运行该示例时,我无法理解数据的分配方式工人?从理论上讲,不同的工作人员应查看数据的不同部分。另外,代码的哪一部分告诉参数在参数服务器上传递?像在multi-gpu的示例中一样,在'cpu:0'中有一个明确的部分。

I am not able to understand how distribution of data takes place when this example is being run on 2 different workers? In theory, different workers should see the different part of the data. Also, what part of the code tells the parameters to pass on the parameter server? Like in the multi-gpu example, there is explicit section for the 'cpu:0'.

推荐答案

不同的工作人员看到的不同通过从预处理图像的单个队列中取出微型批处理图像来使数据的一部分成为可能。详细地说,在用于训练Imagenet模型的分布式设置中,输入图像由多个线程进行预处理,并且预处理后的图像存储在单个 RandomShuffleQueue 中。您可以在tf.RandomShuffleQueue 。 rel = nofollow>此文件以了解其操作方法。多个工作人员被组织为盗版塔,每个塔从同一队列中取出一小批图像,从而获得输入的不同部分。图片此处回答问题的第二部分。在slim.variables.VariableDeviceChooser rel = nofollow>此文件。此处的逻辑确保将 Variable 对象平均分配给充当参数服务器的工作程序。其他所有进行实际培训的工作人员都在步骤开始时获取变量,并在步骤结束时进行更新。

The different workers see different parts of the data by virtue of dequeuing a mini batch images from a single queue of preprocessed images. To elaborate, in the distributed setup for training the Imagenet model, the input images are preprocessed by multiple threads and the preprocessed images are stored in a single RandomShuffleQueue. You can look for tf.RandomShuffleQueue in this file to see how this is done. The multiple workers are organized as 'Inception towers' and each tower dequeues a mini batch of images from the same queue, and thus get different parts of the input. The picture here answers the second part of your question. Look for slim.variables.VariableDeviceChooser in this file. The logic there makes sure that Variable objects are assigned evenly to workers that act as parameter servers. All other workers doing the actual training fetch the variables at the beginning of a step and update them at the end of the step.

这篇关于分布式tensorflow参数服务器和工作器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆