在Google实验室导入数据以进行快速计算和培训的最佳方法? [英] Best way to import data in google-colaboratory for fast computing and training?

查看:88
本文介绍了在Google实验室导入数据以进行快速计算和培训的最佳方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Google的colab上运行一个简单的深度学习模型,但它的运行速度比没有GPU的MacBook Air慢。

I am running a simple deep learning model on Google's colab, but it's running slower than my MacBook Air with no GPU.

我读了这个问题,发现这是一个问题,因为数据集是通过Internet导入的,但是我无法弄清楚如何加快此过程。

I read this question and found out it's a problem because of dataset importing over the internet, but I am unable to figure out how to speed up this process.

可以在此处。知道如何加快时间?

My model can be found here. Any idea of how I can make the epoch faster?

我的本​​地计算机每个时间花费0.5-0.6秒,而Google-colabs需要3-4秒

My local machine takes 0.5-0.6 seconds per epoch and google-colabs takes 3-4 seconds

推荐答案

GPU是否总是比CPU快?没有为什么?因为GPU的速度优化取决于几个因素,

Is GPU always faster than CPU? No, why? because the speed optimization by a GPU depends on a few factors,


  1. 您的代码中有多少部分并行运行/执行,例如,您的代码中有多少部分创建了并行运行的线程,Keras会自动进行处理,这在您的情况下应该不是问题。

  1. How much part of your code runs/executes in parallel, i.e how much part of your code creates threads that run parallel, this is automatically taken care by Keras and should not be a problem in your scenario.

花费的时间在CPU和GPU之间发送数据,这在很多人都步履蹒跚的情况下,假定GPU始终会胜过CPU,但是如果传递的数据太小,则执行计算所需的时间(无需计算步骤)比将数据/进程分解为线程,在GPU中执行它们,然后在CPU上重新组合它们要小得多。

Time Spent sending the data between CPU and GPU, this is where many times people falter, it is assumed that GPU will always outperform CPU, but if data being passed is too small, the time it takes to perform the computation (No of computation steps required) are lesser than breaking the data/processes into thread, executing them in GPU and then recombining them back again on the CPU.

由于您使用了 batch_size of5。
classifier = KerasClassifier(build_fn = build_classifier,epochs = 100,batch_size = 5),如果您的数据集足够大,请增加 batch_size 将提高GPU在CPU上的性能。

The second scenario looks probable in your case since you have used a batch_size of 5. classifier=KerasClassifier(build_fn=build_classifier,epochs=100,batch_size=5), If your dataset is big enough, Increasing the batch_size will increase the performance of GPU over CPU.

除了您使用了相当简单的模型之外, @igrinis指出,数据从驱动器到内存仅加载一次,因此从理论上讲问题不应该是加载时间,因为数据在驱动器上。

Other than that you have used a fairly simple model and as @igrinis pointed out that data is loaded only once from drive to memory so the problem in all theory should not be loading time because the data is on drive.

这篇关于在Google实验室导入数据以进行快速计算和培训的最佳方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆