P100-NC6s-V2上的磁盘I / O非常慢 [英] Disk I/O extremely slow on P100-NC6s-V2

查看:126
本文介绍了P100-NC6s-V2上的磁盘I / O非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在训练蔚蓝ML管道上的图像分割模型。在测试步骤中,我将模型的输出保存到关联的Blob存储中。然后,我想在计算的输出和基本事实之间找到IOU(联合上的交集)。这两组图像都位于Blob存储中。但是,借条的计算非常慢,我认为它是磁盘绑定的。在我的IOU计算代码中,我只是加载了两张图片(注释为其他代码),尽管如此,每次迭代需要将近6秒钟,而训练和测试也足够快。

I am training an image segmentation model on azure ML pipeline. During the testing step, I'm saving the output of the model to the associated blob storage. Then I want to find the IOU (Intersection over Union) between the calculated output and the ground truth. Both of these set of images lie on the blob storage. However, IOU calculation is extremely slow, and I think it's disk bound. In my IOU calculation code, I'm just loading the two images (commented out other code), still, it's taking close to 6 seconds per iteration, while training and testing were fast enough.

这种行为正常吗?如何调试此步骤?

Is this behavior normal? How do I debug this step?

推荐答案

AzureML远程运行可用的驱动器上的一些注意事项:

A few notes on the drives that an AzureML remote run has available:

这是在远程运行 df 时看到的内容(在此示例中,我使用的是数据存储区通过 as_mount()):

Here is what I see when I run df on a remote run (in this one, I am using a blob Datastore via as_mount()):

Filesystem                             1K-blocks     Used  Available Use% Mounted on
overlay                                103080160 11530364   86290588  12% /
tmpfs                                      65536        0      65536   0% /dev
tmpfs                                    3568556        0    3568556   0% /sys/fs/cgroup
/dev/sdb1                              103080160 11530364   86290588  12% /etc/hosts
shm                                      2097152        0    2097152   0% /dev/shm
//danielscstorageezoh...-620830f140ab 5368709120  3702848 5365006272   1% /mnt/batch/tasks/.../workspacefilestore
blobfuse                               103080160 11530364   86290588  12% /mnt/batch/tasks/.../workspaceblobstore

有趣的项目是覆盖 / dev / sdb1 // danielscstorageezoh ...- 620830f140ab blobfuse


  1. 覆盖 / dev / sdb1 都是计算机上本地SSD 的安装(我使用的是STANDARD_D2_V2

  2. // danielscstorageezoh ...- 620830f140ab Azure文件共享的挂载,其中包含项目文件(您的脚本等)。这也是您运行的当前工作目录

  3. blobfuse 是我在执行运行时请求安装在 Estimator 中的Blob存储。

  1. overlay and /dev/sdb1 are both the mount of the local SSD on the machine (I am using a STANDARD_D2_V2 which has a 100GB SSD).
  2. //danielscstorageezoh...-620830f140ab is the mount of the Azure File Share that contains the project files (your script, etc.). It is also the current working directory for your run.
  3. blobfuse is the blob store that I had requested to mount in the Estimator as I executed the run.

我对这三种类型的驱动器之间的性能差异感到好奇。我的迷你基准测试是下载并提取此文件: http://download.tensorflow.org/example_images /flower_photos.tgz (这是一个220 MB的tar文件,其中包含约3600张jpeg鲜花)。

I was curious about the performance differences between these 3 types of drives. My mini benchmark was to download and extract this file: http://download.tensorflow.org/example_images/flower_photos.tgz (it is a 220 MB tar file that contains about 3600 jpeg images of flowers).

结果如下:

Filesystem/Drive         Download_and_save       Extract
Local_SSD                               2s            2s  
Azure File Share                        9s          386s
Premium File Share                     10s          120s
Blobfuse                               10s          133s
Blobfuse w/ Premium Blob                8s          121s

小文件在网络驱动器上要慢得多,因此,如果要写小文件,强烈建议使用/ tmp或Python tempfile

In summary, writing small files is much, much slower on the network drives, so it is highly recommended to use /tmp or Python tempfile if you are writing smaller files.

作为参考,这里运行了我要测量的脚本: https://gist.github.com/danielsc/9f062da5e66421d48ac5ed84aabf8535

For reference, here the script I ran to measure: https://gist.github.com/danielsc/9f062da5e66421d48ac5ed84aabf8535

这就是我的运行方式: https://gist.github.com/danielsc/6273a43c9b1790d82216bdaea6e10e5c

And this is how I ran it: https://gist.github.com/danielsc/6273a43c9b1790d82216bdaea6e10e5c

这篇关于P100-NC6s-V2上的磁盘I / O非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆