如何在多台机器上运行dask？ [英] How to run dask in multiple machines?

查看：172 发布时间：2020/10/15 18:38:28 dask

本文介绍了如何在多台机器上运行dask？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近发现了Dask。我对Dask Dataframe和其他数据结构有非常基本的问题。

I found Dask recently. I have very basic questions about Dask Dataframe and other data structures.

Dask Dataframe是不可变的数据类型吗？

Dask数组和Dataframe是惰性数据结构吗？

我不知道是否要使用dask或spark或pandas 。我有200 GB的数据要计算。使用普通的python程序花费了9个小时来计算操作。但是通过使用16核处理器，它可以在较短的时间内并行处理。如果将数据框划分为大熊猫，则需要担心计算的可交换和关联属性。另一方面，我可以使用独立的Spark集群来拆分数据并并行运行。

I dont know whether to use dask or spark or pandas for my situation. I have 200 GB of data to compute. It took 9 hours to compute operations using plain python program. But it can be processed parallelly in lesser time by utilizing 16 core processor. If I split the dataframe in pandas I need to worry about commutative and associative property of my calculations. On the other hand I can use standalone spark cluster to just split up the data and run parallelly.

是否需要像在Spark中那样在Dask中设置任何集群？< br>
如何在我自己的计算节点中运行Dask数据帧？

Dask是否需要主从设置？

Do I need to setup any clusters in Dask as like as Spark?
How to run Dask dataframes in my own compute nodes?
Does Dask need master-slave setup?

我是熊猫的粉丝，所以我正在寻找类似于熊猫的解决方案。

I am a fan of pandas, so I am looking for solutions similar to pandas.

如何在多台机器上运行dask？ [英] How to run dask in multiple machines?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在多台机器上运行dask？ [英] How to run dask in multiple machines?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭