从ADLS2转移到Compute Target速度非常慢的Azure机器学习 [英] Transfer from ADLS2 to Compute Target very slow Azure Machine Learning

查看:143
本文介绍了从ADLS2转移到Compute Target速度非常慢的Azure机器学习的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在计算目标上执行的训练脚本期间,我们正尝试从ADLS2数据存储区下载注册的数据集.问题在于,使用以下方法,需要小时的时间才能将〜1.5Gb(分割成〜8500个文件)下载到计算目标:

During a training script executed on a compute target, we're trying to download a registered Dataset from an ADLS2 Datastore. The problem is that it takes hours to download ~1.5Gb (splitted into ~8500 files) to the compute target with the following method :

from azureml.core import Datastore, Dataset, Run, Workspace

# Retrieve the run context to get Workspace
RUN = Run.get_context(allow_offline=True)

# Retrieve the workspace
ws = RUN.experiment.workspace

# Creating the Dataset object based on a registered Dataset
dataset = Dataset.get_by_name(ws, name='my_dataset_registered')

# Download the Dataset locally
dataset.download(target_path='/tmp/data', overwrite=False)

重要说明::数据集已注册到Datalake中的某个路径,该路径包含许多子文件夹(以及子文件夹,..),这些子文件夹包含大约170Kb的小文件.

Important note : the Dataset is registered to a path in the Datalake that contains a lot of subfolders (as well subsubfolders, ..) containing small files of around 170Kb.

注意:我能够在几分钟之内使用az copy或Storage Explorer将完整的数据集下载到本地计算机.此外,在文件夹阶段使用通配符**定义数据集以扫描子文件夹:datalake/relative/path/to/folder/**

Note: I'm able to download the complete dataset to local computer within a few minutes using az copy or the Storage Explorer. Also, the Dataset is defined at a folder stage with the ** wildcard for scanning subfolders : datalake/relative/path/to/folder/**

这是一个已知问题吗?如何提高传输速度?

Is that a known issue ? How can I improve transfer speed ?

谢谢!

推荐答案

经过编辑,使其更加类似于答案:

包括以下内容将是有帮助的:您正在使用哪个版本的azureml-core和azureml-dataprep SDK,您正在作为计算实例运行哪种类型的VM,以及哪种类型的文件(例如jpg?txt?).您的数据集正在使用.另外,您想通过将完整的数据集下载到计算中来实现什么?

It'd be helpful to include: what versions of azureml-core and azureml-dataprep SDK you are using, what type of VM you are running as the compute instance, and what types of files (e.g. jpg? txt?) your dataset is using. Also, what are you trying to achieve by downloading the complete dataset to your compute?

当前,计算实例映像随附预装了1-2个月的azureml-core 1.0.83和azureml-dataprep 1.1.35.您可能甚至使用了较旧的版本.您可以通过在笔记本电脑上运行来尝试升级:

Currently, compute instance image comes with azureml-core 1.0.83 and azureml-dataprep 1.1.35 pre-installed, which are 1-2 months old. You might be using even older versions. You can try upgrading by running in your notebook:

%pip install -U azureml-sdk

如果您看不到任何改善,可以在官方docs页面上提交问题,以找人帮助调试您的问题,例如

If you don't see any improvements to your scenario, you can file an issue on the official docs page to get someone to help debug your issue, such as the ref page for FileDataset.

(于2020年6月9日编辑,删除了对实验性发布的提及,因为这种情况不再发生了)

这篇关于从ADLS2转移到Compute Target速度非常慢的Azure机器学习的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆