AMLS实验运行处于“运行"状态 [英] AMLS Experiment run stuck in status "Running"

查看:114
本文介绍了AMLS实验运行处于“运行"状态的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我进行了一次Azure机器学习服务实验,并使用Jupyter Notebook记录了神经网络的损失.日志记录工作正常,并且完成了NN培训.但是,实验处于运行状态.关闭计算资源不会关闭实验"运行,因此无法从实验"面板中将其取消.此外,该运行没有任何日志文件.

I made an Azure Machine Learning Service Experiment run and logged neural network losses with Jupyter Notebook. Logging worked fine and NN training completed as it should. However, the experiment is stuck in the running status. Shutting down the compute resources does not shut down the Experiment run and I cannot cancel it from the Experiment panel. In addition, the run does not have any log-files.

有人有同样的行为吗?现在,运行已持续超过24小时.

Has anyone had the same behavior? Run has now lasted for over 24 hours.

推荐答案

这完全是不时发生的.这无疑令人沮丧,尤其是因为取消"按钮变灰.您可以使用CLI或Python SDK取消运行.

this totally happens from time to time. it is certainly frustrating especially because the "Cancel" button it grayed out. You can use either the CLI or Python SDK to cancel the run.

从版本 1.16.0 开始,您不再需要 Experiment 对象.相反,您可以使用直接使用 Workspace 对象

As of version 1.16.0 you no longer an Experiment object is no longer needed. Instead you can access using the Run or Workspace objects directly

from azureml.core import Workspace, Experiment, Run, VERSION
print("SDK version:", VERSION)

ws = Workspace.from_config()

run = ws.get_run('YOUR_RUN_ID')
run = Run().get(ws, 'YOUR_RUN_ID') # also works
run.cancel()

<1.16.0

来自azureml.core的

< 1.16.0

from azureml.core import Workspace, Experiment, Run, VERSION
print("SDK version:", VERSION)

ws = Workspace.from_config()
exp = Experiment(workspace = ws, name = 'YOUR_EXP_NAME')

run = Run(exp, run_id='YOUR STEP RUN ID')

run.cancel() # or run.fail()

CLI

更多CLI详细信息在这里

az login
az ml run cancel --run YOUR_RUN_ID

这篇关于AMLS实验运行处于“运行"状态的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆