Dask是否与HDFS通信以优化数据局部性? [英] Does Dask communicate with HDFS to optimize for data locality?

查看：157 发布时间：2020/8/10 18:57:58 hdfs dask dask-distributed

本文介绍了Dask是否与HDFS通信以优化数据局部性?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Dask中分发了文档，它们具有以下信息:

In Dask distributed documentation, they have the following information:

例如，Dask开发人员使用此功能来建立数据局部性当我们与本地数据存储系统(如Hadoop File)进行通信时系统.当用户使用高级功能时，例如 dask.dataframe.read_csv('hdfs:///path/to/files.*.csv')Dask与 HDFS名称节点，找到所有数据块的位置，并将该信息发送到调度程序，以便它可以使更明智的决策并缩短用户的加载时间.

For example Dask developers use this ability to build in data locality when we communicate to data-local storage systems like the Hadoop File System. When users use high-level functions like dask.dataframe.read_csv('hdfs:///path/to/files.*.csv') Dask talks to the HDFS name node, finds the locations of all of the blocks of data, and sends that information to the scheduler so that it can make smarter decisions and improve load times for users.

但是，似乎get_block_locations()已从HDFS fs后端中删除，所以我的问题是:Dask关于HDFS的当前状态是什么?是否将计算发送到本地数据节点?是否在优化调度程序时考虑了HDFS上的数据局部性?

However, it seems that the get_block_locations() was removed from the HDFS fs backend, so my question is: what is the current state of Dask regarding to HDFS ? Is it sending computation to nodes where data is local ? Is it optimizing the scheduler to take into account data locality on HDFS ?

Dask是否与HDFS通信以优化数据局部性? [英] Does Dask communicate with HDFS to optimize for data locality?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Dask是否与HDFS通信以优化数据局部性? [英] Does Dask communicate with HDFS to optimize for data locality?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭