dask dataframe是否应用保留行顺序? [英] Does dask dataframe apply preserve rows order?

查看:118
本文介绍了dask dataframe是否应用保留行顺序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑使用具有当前状态的闭包来计算滚动窗口(在我的情况下为宽度2),以回答我自己的

I am considering using a closure with the current state, to compute the rolling window (which in my case is of width 2), to answer my own question, which I have recently posed. Something on the lines of:

def test(init_value):

    def my_fcn(x,y):

        nonlocal init_value
        actual_value = (x + y) * init_value

        init_value = actual_value
        return init_value

    return my_fcn

其中,my_fcn是用于测试的伪函数.因此,例如,我们可以假设初始值为零,则可以将该函数初始化为actual_fcn = test(0);.最后,可以通过ddf.apply使用功能(其中ddf是实际的dask数据帧).

where my_fcn is a dummy function used for testing. Therefore the function might be initialised thorugh actual_fcn = test(0); where we assume the initial value is zero, for example. Finally one could use the function through ddf.apply (where ddf is the actual dask dataframe).

最后一个问题:如果保留计算顺序,这将起作用,否则所有内容将被打乱.我尚未对其进行测试,因为-即使它通过了-我也不能100%确信它将始终保留该订单.因此,问题是:

Finally the question: this would work, if the order of the computations is preserved, otherwise everything would be scrambled. I have not tested it, since -even if it passes- I cannot be 100% sure it will always preserve the order. So, question is:

dask数据框的apply方法是否保留行顺序?

Does dask dataframe's apply method preserve rows order?

还有其他想法吗?任何帮助表示高度赞赏.

Any other ideas? Any help highly appreciated.

推荐答案

显然是.我正在使用dask 1.0.0.

Apparently yes. I am using dask 1.0.0.

以下代码:

import numpy as np
import pandas as pd
import dask.dataframe as dd
number_of_components = 30

df = pd.DataFrame(np.random.randint(0,number_of_components,size=(number_of_components, 4)), columns=list('ABCD'))
my_data_frame = dd.from_pandas(df, npartitions = 1 )


def sumPrevious( previousState ) :

    def getValue(row):
        nonlocal previousState 
        something = row['A'] - previousState 
        previousState = row['A']
        return something
    return getValue

given_func = sumPrevious(1)
out = my_data_frame.apply(given_func, axis = 1 , meta = float).compute()

表现符合预期.有一个很大的警告:如果以前的状态是通过引用提供的(即:它是某个类的某个对象),则用户应谨慎使用嵌套函数内部的相等性来更新以前的状态:因为它将产生副作用,如果状态通过引用传递.

behaves as expected. There is a big caveat: if the previous state is provided by reference (i.e.: it is some object of some class) then the user should be careful in using equality inside the nested function to update the previous state: since it will have side effects, if the state is passed by reference.

严格地说,此示例不能证明在任何情况下都可以保留订单;所以我仍然会对我是否可以依靠这个假设感兴趣.

Rigorously, this example does not prove that order is preserved under any circumstances; so I would still be interested whether I can rely on this assumption.

这篇关于dask dataframe是否应用保留行顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆