在群集上运行的Dask程序中找不到文件错误 [英] File Not Found Error in Dask program run on cluster

查看:143
本文介绍了在群集上运行的Dask程序中找不到文件错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有4台机器,分别是M1,M2,M3和M4。调度程序,客户端,工作程序在M1上运行。我在M1中放入了一个csv文件。其余的机器是工人。

I have 4 machines, M1, M2, M3, and M4. The scheduler, client, worker runs on M1. I've put a csv file in M1. Rest of the machines are workers.

在dask中运行带有read_csv文件的程序时。它给我错误,找不到文件

When I run the program with read_csv file in dask. It gives me Error, file not found

推荐答案

当您的一个工作人员尝试加载CSV时,它将无法找到它,因为它不在该本地光盘上。这不足为奇。您可以通过多种方式解决此问题:

When one of your workers tries to load the CSV, it will not be able to find it, because it is not present on that local disc. This should not be a surprise. You can get around this in a number of ways:


  • 将文件复制到每个工作人员;就磁盘空间而言,这显然是浪费,但是最容易实现的

  • 将文件放置在网络文件系统(NFS挂载,gluster,HDFS等)上

  • 将文件放置在诸如Amazon S3之类的外部存储系统上并引用该位置

  • 将数据加载到本地进程中并分散分发;在这种情况下,数据可能足够小以适合内存,并且dask可能对您没有多大帮助。

  • copy the file to every worker; this is obviously wasteful in terms of disc space, but the easiest to achieve
  • place the file on a networked filesystem (NFS mount, gluster, HDFS, etc.)
  • place the file on an external storage system such as amazon S3 and refer to that location
  • load the data in your local process and distribute it with scatter; in this case presumably the data was small enough to fit in memory and probably dask would not be doing much for you.

这篇关于在群集上运行的Dask程序中找不到文件错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆