达斯克的基本groupby操作 [英] basic groupby operations in Dask

查看：114 发布时间：2020/5/24 3:25:48 python pandas dask

本文介绍了达斯克的基本groupby操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用Dask处理大文件(50 gb).通常，我会将其加载到内存中并使用Pandas.我想对"A"和"B"两列进行分组，每当"C"列以一个值开头时，我都希望在该列中为该特定组重复该值.

I am attempting to use Dask to handle a large file (50 gb). Typically, I would load it in memory and use Pandas. I want to groupby two columns "A", and "B", and whenever column "C" starts with a value, I want to repeat that value in that column for that particular group.

在熊猫中，我将执行以下操作:

In pandas, I would do the following:

df['C'] = df.groupby(['A','B'])['C'].fillna(method = 'ffill')

达斯克相当于什么? 另外，我对如何在达斯克(Dask)而不是熊猫(Pandas)中解决问题感到迷茫，

What would be the equivalent in Dask? Also, I am a little bit lost as to how to structure problems in Dask as opposed to in Pandas,

谢谢

我到目前为止的进展:

第一组索引:

df1 = df.set_index(['A','B'])

然后分组依据:

df1.groupby(['A','B']).apply(lambda x: x.fillna(method='ffill').compute()

达斯克的基本groupby操作 [英] basic groupby operations in Dask

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

达斯克的基本groupby操作 [英] basic groupby operations in Dask

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭