在多个设备上训练 [英] train on multiple devices

查看:48
本文介绍了在多个设备上训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道TensorFlow提供了分布式训练API,可以在多个设备上进行训练,例如多个GPU,CPU,TPU或多个计算机(工作人员)请遵循此文档: https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras

I have know that TensorFlow offer Distributed Training API that can train on multiple devices such as multiple GPUs, CPUs, TPUs, or multiple computers ( workers) Follow this doc : https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras

但是我有一个问题吗,这是一种使用Data Parallelism在多台机器(包括移动设备和计算机设备)上进行训练的火车拆分方法吗?

But I have a question is this any possible way to split the train using Data Parallelism to train across multiple machines ( include mobile devices and computer devices)?

如果您有任何教程/说明,我将不胜感激.

I would be really grateful if you have any tutorial/instruction.

推荐答案

据我所知,Tensorflow仅支持CPU,TPU和GPU进行分布式培训,考虑到所有设备都应在同一网络中.

As per my knowledge, Tensorflow only supports CPU, TPU, and GPU for distributed training, considering all the devices should be in the same network.

如前所述,要连接多个设备,您可以按照多工人培训

For connecting multiple devices, as you mentioned you can follow Multi-worker training.

tf.distribute.Strategy 已集成到 tf.keras 中,因此,当 model.fit tf.distribute一起使用时.Strategy 实例,然后对模型使用 strategy.scope()可以创建分布式变量,从而可以将输入数据平均分配到设备上.您可以按照教程了解更多详细信息.
另外,分布式输入可以为您提供帮助.

tf.distribute.Strategy is integrated to tf.keras, so when model.fit is used with tf.distribute.Strategy instance and then using strategy.scope() for your model allows to create distributed variables.This allows it to equally divide your input data on your devices. You can follow this tutorial for more details.
Also Distributed input could help you.

这篇关于在多个设备上训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆