从 pandas 迁移到dask以利用所有本地cpu核心 [英] move from pandas to dask to utilize all local cpu cores

查看:96
本文介绍了从 pandas 迁移到dask以利用所有本地cpu核心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近我偶然发现了 http://dask.pydata.org/en/latest/ 由于我有一些仅在单个内核上运行的熊猫代码,我想知道如何利用我的其他CPU内核. 使用所有(本地)CPU内核会很好吗?如果是的话,它与大熊猫兼容吗?

Recently I stumbled upon http://dask.pydata.org/en/latest/ As I have some pandas code which only runs on a single core I wonder how to make use of my other CPU cores. Would dask work well to use all (local) CPU cores? If yes how compatible is it to pandas?

我可以将多个CPU与熊猫一起使用吗?到目前为止,我读过有关发布GIL的信息,但这一切似乎都相当复杂.

Could I use multiple CPUs with pandas? So far I read about releasing the GIL but that all seems rather complicated.

推荐答案

使用所有(本地)CPU内核可以很好地工作吗?

Would dask work well to use all (local) CPU cores?

是的

它与大熊猫兼容吗?

how compatible is it to pandas?

完全兼容.不是100%.如有需要,您可以将Dask和NumPy甚至是纯Python混合在一起.

Pretty compatible. Not 100%. You can mix in Pandas and NumPy and even pure Python stuff with Dask if needed.

我可以将多个CPU与大熊猫一起使用吗?

Could I use multiple CPUs with pandas?

可以.最简单的方法是使用multiprocessing并使数据分开-如果可以高效地将每个作业从磁盘上独立读取并写入磁盘. mpi4py是更困难的方法,如果您在多计算机环境中拥有专业的管理员,这将非常有用.

You could. The easiest way would be to use multiprocessing and keep your data separate--have each job independently read from disk and write to disk if you can do so efficiently. A significantly harder way is using mpi4py which is most useful if you have a multi-computer environment with a professional administrator.

这篇关于从 pandas 迁移到dask以利用所有本地cpu核心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆