如何更改dask数据框中的行和列? [英] How do I change rows and columns in a dask dataframe?

查看:117
本文介绍了如何更改dask数据框中的行和列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Dask数据框几乎没有问题.

There are few issues I am having with Dask Dataframes.

让我说我有一个包含2列['a','b']

lets say I have a dataframe with 2 columns ['a','b']

如果我想要一个新列c = a + b

我会在熊猫里做

df['c'] = df['a'] + df['b']

在黄昏时,我正在执行以下相同的操作:

In dask I am doing the same operation as follows:

df = df.assign(c=(df.a + df.b).compute())

是否有可能以更好的方式编写此操作,类似于我们在熊猫中所做的操作?

is it possible to write this operation in a better way, similar to what we do in pandas?

第二个问题让我更加困扰.

Second question is something which is troubling me more.

在熊猫中,如果我想更改第2行和第2行的'a'值, 6到np.pi,我执行以下操作

In pandas if i want to change the value of 'a' for row 2 & 6 to np.pi , I do the following

df.loc[[2,6],'a']  = np.pi

我无法弄清楚如何在Dask中执行类似的操作.我的逻辑选择了一些行,而我只想更改这些行中的值.

I have not been able to figure out how to do a similar operation in Dask. My logic selects some rows and I only want to change values in those rows.

推荐答案

编辑添加新列

Setitem语法现在可以在dask.dataframe中使用

Edit Add New Columns

Setitem syntax now works in dask.dataframe

df['z'] = df.x + df.y

旧答案:添加新列

您正确的说,setitem语法在 dask.dataframe .

df['c'] = ... # mutation not supported

如您所建议,您应该改用.assign(...).

As you suggest you should instead use .assign(...).

df = df.assign(c=df.a + df.b)

在您的示例中,您不必要地调用了.compute().通常,您只想在获得最终结果后才在最后调用计算.

In your example you have an unnecessary call to .compute(). Generally you want to call compute only at the very end, once you have your final result.

和以前一样,dask.dataframe不支持更改行.就并行代码而言,就地操作很难进行推理.目前,在这种情况下,dask.dataframe没有很好的替代操作.我提出了问题#653 来讨论该主题.

As before, dask.dataframe does not support changing rows in place. Inplace operations are difficult to reason about in parallel codes. At the moment dask.dataframe has no nice alternative operation in this case. I've raised issue #653 for conversation on this topic.

这篇关于如何更改dask数据框中的行和列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆