在分布式张量流学习中使用参数服务器的原因是什么? [英] What is the reason to use parameter server in distributed tensorflow learning?

查看：36 发布时间：2021/9/5 19:08:22 tensorflow distributed

本文介绍了在分布式张量流学习中使用参数服务器的原因是什么?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

简短版本:我们不能在其中一个工作线程中存储变量而不使用参数服务器吗?

Short version: can't we store variables in one of the workers and not use parameter servers?

长版:我想在tensorflow中实现神经网络的同步分布式学习.我希望每个工人在训练期间都拥有模型的完整副本.

Long version: I want to implement synchronous distributed learning of neural network in tensorflow. I want each worker to have a full copy of the model during training.

我已阅读分布式张量流教程和分布式训练imagenet代码，不明白为什么我们需要参数服务器吗.

I've read distributed tensorflow tutorial and code of distributed training imagenet and didn't get why do we need parameter servers.

我看到它们用于存储变量的值，而 replica_device_setter 会注意变量在参数服务器之间均匀分布(可能它做了更多的事情，我无法完全理解代码).

I see that they are used for storing values of variables and replica_device_setter takes care that variables are evenly distributed between parameter servers (probably it does something more, I wasn't able to fully understand the code).

问题是:为什么我们不使用其中一个工人来存储变量?如果我使用

The question is: why don't we use one of the workers to store variables? Will I achieve that if I use

with tf.device('/job:worker/task:0/cpu:0'):

代替

with tf.device(tf.train.replica_device_setter(cluster=cluster_spec)):

对于变量?如果这可行，与使用参数服务器的解决方案相比是否存在缺点?

for Variaibles? If that works is there downside comparing to solution with parameter servers?

在分布式张量流学习中使用参数服务器的原因是什么? [英] What is the reason to use parameter server in distributed tensorflow learning?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在分布式张量流学习中使用参数服务器的原因是什么? [英] What is the reason to use parameter server in distributed tensorflow learning?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭