提交TensorFlow估算器作为运行实验 [英] Submitting TensorFlow estimator as run to experiment

查看：69 发布时间：2019/6/15 0:19:19 AzureMachineLearningService

本文介绍了提交TensorFlow估算器作为运行实验的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好。

我为模型创建了一个训练脚本，并且我在Azure ML Services中的计算群集上运行它。

I've created a training script for a model and i've run it on a compute cluster in Azure ML Services.

一切正常。但是现在我正在尝试将完全相同的设置移动到另一个Azure订阅。出于某种原因，当我提交运行时没有任何反应。是否有任何先决条件/权限需要提交运行。我可以毫无问题地创建计算
集群。

It works fine. However now i'm trying to move the exact same setup to another Azure subscription. For some reason when I submit the run nothing happens. Are there any prerequisites/rights that needs to be in order to submit runs. I can create the compute cluster without problems.

我正在使用带有以下代码的python SDK

I'm using the python SDK with the following code

from azureml.core.workspace import Workspace
import azureml.core
import os

ws = Workspace.from_config()
print('Workspace name: ' + ws.name,
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep='\n')

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cluster_name = "gpucluster3"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target.')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
                                                           min_nodes=0,
                                                           max_nodes=2)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

print(compute_target.get_status().serialize())

from azureml.core import Experiment

experiment_name = 'test'
experiment = Experiment(ws, name=experiment_name)

from azureml.train.dnn import TensorFlow
script_params={'--data_dir': ds_data.as_mount()}
# I did not include all parameter definitions but they are defined
estimator= TensorFlow(source_directory=project_folder,
                      compute_target=compute_target,
                      script_params=script_params,
                      entry_script='train-hov.py',
                      pip_packages=['keras==2.1.2','h5py'],
                      node_count=2,
                      process_count_per_node=1,
                      distributed_backend='mpi',
                      use_gpu=True)
run = experiment.submit(estimator)

当我运行脚本时，它会卡在最后一行和实验中运行没有提交。

有没有办法让我调试这个？或者需要设置哪些资源提供者？

When I run the script it gets stuck at the last line and the experiment or run doesn't get submitted.
Is there a way for me to debug this? Or what resource providers or a like needs to be set?

希望你可以提供帮助

提交TensorFlow估算器作为运行实验 [英] Submitting TensorFlow estimator as run to experiment

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

提交TensorFlow估算器作为运行实验 [英] Submitting TensorFlow estimator as run to experiment

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭