从本地运行ML培训和测试迁移到Google Cloud [英] Migrate from running ML training and testing locally to Google Cloud

查看:82
本文介绍了从本地运行ML培训和测试迁移到Google Cloud的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前在本地运行着一个简单的机器学习基础架构,我希望将其全部迁移到Google Cloud上.我只是从数据库中获取所需的数据,构建模型,然后在测试数据上测试模型.这些都是在本地的PyCharm中完成的.

I currently have a simple Machine Learning infrastructure running locally and I want to migrate this all onto Google Cloud. I simply fetch the data I need from a database, build my model and then test the model on test data. This is all done in PyCharm locally.

我想简单地迁移它,并有可能在Google Cloud上完成所有这些操作,同时又可以灵活地进行本地更改,这些更改也可以在Cloud上运行时应用.有许多与此相关的Google Cloud资源,所以我正在寻找人们遵循的最佳实践.

I want to simply migrate this and have the possibility for all this to be done on Google Cloud, while having the flexibility to make local changes that can apply when run on the cloud as well. There are many Google Cloud resources relating to this and so I am looking for best practices people follow on running such a procedure.

谢谢,如果需要任何说明,请告诉我.

Thanks and please let me know if there are any clarifications needed.

推荐答案

我强烈建议您看一下由以下内容组成的云中的这种机器学习工作流程:

I highly suggest you to take a look at this machine learning workflow in the cloud which consists of:

  • 数据提取和收集
  • 存储数据.
  • 处理数据.
  • 机器学习培训.
  • ML部署.

如果您想使用Google Cloud Platform提取数据,可以使用多种资源.我可以向您推荐的最简单的解决方案是 Google Compute Engine App Engine应用(例如,用于用户填充一些数据的论坛)上).

There are multiple resources you can use if you would like to ingest data with Google Cloud Platform. The simplest solution I can recommend to you are both Google Compute Engine or an App Engine App (for example for a forum where a user fill some data up).

尽管如此,如果您想实时提取数据,也可以使用Cloud Pub/Sub.

Nonetheless, if you would like to ingest data in real-time, you can also use Cloud Pub/Sub.

如前所述,您正在从数据库中检索所有信息.如果您曾经使用过SQL或NoSQL,我强烈建议您使用 Cloud SQL .不仅在构建实例时提供了良好的界面,而且还使您可以安全,快速地访问它.

As you mentioned, you are retrieving all the information from a database. If you are used to work with SQL or NoSQL I highy suggest you to go after Cloud SQL. Not only provides a good interface when building your instance, but also lets you access it securely and very rapidly.

如果不是这种情况,您还可以使用 Google云存储 BigQuery ,但是在这两个方面,我会选择BigQuery,因为它也有可能处理流数据.

If it not the case, you can also use Google Cloud Storage or BigQuery, but over those two, I will pick BigQuery since it has also the possibility to work with stream data.

要在将数据输入模型之前处理数据,可以使用以下任一方法:

For processing data before feeding it to the model you can use either:

  • Cloud DataFlow :Cloud Dataflow是一项完全托管的服务用于以相同的可靠性和表现力以流(实时)和批处理(历史)模式转换和丰富数据-不再需要复杂的解决方法或折衷方案.
  • Cloud Dataproc :Dataproc是一种快速,易于使用,完全托管云服务,以更简单,更具成本效益的方式运行Apache Spark和Apache Hadoop集群.
  • Cloud Dataprep :Trifacta的Cloud Dataprep是智能数据可视地探索,清理和准备结构化和非结构化数据以进行分析,报告和机器学习的服务.
  • Cloud DataFlow: Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed.
  • Cloud Dataproc: Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.
  • Cloud Dataprep: Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning.

对于训练/部署ML模型,我建议使用

For training/deploying your ML model I would suggest to use AI platform.

AI平台使机器学习开发人员,数据科学家和数据工程师可以轻松,快速,经济高效地将其ML项目从构思转移到生产和部署.

AI Platform makes it easy for machine learning developers, data scientists, and data engineers to take their ML projects from ideation to production and deployment, quickly and cost-effectively.

如果您必须处理庞大的数据集,则最佳实践是使用AI平台将模型作为Tensorflow作业运行,因此您可以拥有一个训练集群.

If you have to work with huge datasets, the best practices are run the model as a Tensorflow job with AI Platform so you can have a training cluster.

最后要使用AI平台部署模型,您可以在此处查看

Finally for deploying your models using AI Platform, you can take a look here.

这篇关于从本地运行ML培训和测试迁移到Google Cloud的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆