如果为空则追加到DataFrame的问题 [英] Issue with appending to DataFrame if empty

查看:124
本文介绍了如果为空则追加到DataFrame的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个超出本地方法范围初始化的数据框.我想做如下:

I have a data frame that I initialize out of scope of a local method. I would like to do as follows:

def outer_method():
    ... do outer scope stuff here
    df = pd.DataFrame(columns=['A','B','C','D'])
    def recursive_method(arg):
        ... do local stuff here
        # func returns a data frame to be appended to empty data frame
        results_df = func(args)
        df.append(results_df, ignore_index=True)
        return results
recursive_method(arg)
return df

但是,这不起作用.如果我以这种方式附加 df ,则它始终为空.

However, this does NOT work. The df is always empty if I append to it this way.

我在这里找到了解决问题的答案: append-to-an-empty-data-frame-in-pandas ...如果空的DataFrame对象在方法的范围内,则此方法有效,但不适用于我的情况.根据@DSM的评论但附加操作不会就地发生,因此,如果需要,您将必须存储输出:"

I found the answer to my problem here: appending-to-an-empty-data-frame-in-pandas... this works, IF the empty DataFrame object is in scope of the method, but not for my case. As per @DSM's comment "but the append doesn't happen in-place, so you'll have to store the output if you want it:"

爱荷华州,我需要输入以下内容:

IOW, I would need to have something like:

df = df.append(results_df, ignore_index=True)

在我的本地方法中,但这并不能帮助我访问要附加到其外部作用域变量df的内容.

in my local method, but this doesn't help me get access to my outer scope variable df to append to it.

是否有办法做到这一点?这与python extend 方法一起使用可以很好地扩展列表对象的内容(我知道DataFrames不是列表,而是...).有没有一种类似的方法可以对DataFrame对象执行此操作,而不必处理我对 df 的作用域问题?

Is there a way to make this happen in place? This works fine with the python extend method for extending the contents of a list object (I realize DataFrames are not lists, but...). Is there an analogous way to do this with a DataFrame object without having to deal with my scoping issues for df?

顺便说一句,Pandas concat 方法也可以使用,但是我遇到了变量作用域的问题.

Btw, the Pandas concat method also works, but I run into the issue of variable scope.

推荐答案

在Python3中,您可以使用 nonlocal关键字:

In Python3, you could use the nonlocal keyword:

def outer_method():
    ... do outer scope stuff here
    df = pd.DataFrame(columns=['A','B','C','D'])
    def recursive_method(arg):
        nonlocal df
        ... do local stuff here
        # func returns a data frame to be appended to empty data frame
        results_df = func(args)
        df = df.append(results_df, ignore_index=True)
        return results

return df

但是请注意,每次调用 df.append 都会返回一个新的DataFrame,因此需要将所有旧数据复制到新的DataFrame中.如果您在一个循环内执行此操作N次,则最终将以1 + 2 + 3 + ... + N = O(N ^ 2)的数量进行复制-对性能非常不利.

But note that calling df.append returns a new DataFrame each time and thus requires copying all the old data into the new DataFrame. If you do this inside a loop N times, you end up making on the order of 1+2+3+...+N = O(N^2) copies -- very bad for performance.

如果您不需要出于任何其他目的在 recursive_method 内的 df 追加时,最好将其追加到列表中,然后构造完成 recursive_method 后,通过DataFrame(通过调用 pd.concat 一次):

If you do not need df inside recursive_method for any purpose other than appending, it is better to append to a list, and then construct the DataFrame (by calling pd.concat once) after recursive_method is done:

df = pd.DataFrame(columns=['A','B','C','D'])
data = [df]
def recursive_method(arg, data):
    ... do stuff here
     # func returns a data frame to be appended to empty data frame
     results_df = func(args)
     data.append(df_join_out)
     return results
recursive_method(arg, data)
df = pd.concat(data, ignore_index=True)

这是最佳解决方案,如果您需要做的就是收集内部数据 recursive_method ,可以在之后等待构造新的 df recursive_method 完成.

This is the best solution if all you need to do is collect data inside recursive_method and can wait to construct the new df after recursive_method is done.

在Python2中,如果必须在 recursive_method 内使用 df ,则可以通过将 df 作为 recursive_method 的参数,并返回 df :

In Python2, if you must use df inside recursive_method, then you could pass df as argument to recursive_method, and return df too:

df = pd.DataFrame(columns=['A','B','C','D'])
def recursive_method(arg, df):
    ... do stuff here
     results, df = recursive_method(arg, df)
     # func returns a data frame to be appended to empty data frame
     results_df = func(args)
     df = df.append(results_df, ignore_index=True)
     return results, df
results, df = recursive_method(arg, df)

,但是请注意,执行O(N ^ 2)复制将付出沉重的代价上面提到过.

but be aware that you will be paying a heavy price doing the O(N^2) copying mentioned above.

为什么不能将DataFrames 附加到原位:

Why DataFrames can not should not be appended to in-place:

DataFrame中的基础数据存储在NumPy数组中.一个中的数据NumPy数组来自一个连续的内存块.有时候没有足够的空间来将NumPy数组调整为更大的连续内存块即使有内存可用-想象一下将数组夹在中间其他数据结构.在这种情况下,为了调整数组的大小,内存块必须分配到其他位置,并且来自原始数组必须复制到新块.一般来说,这是无法完成的就地.

The underlying data in a DataFrame is stored in NumPy arrays. The data in a NumPy array comes from a contiguous block of memory. Sometimes there is not enough space to resize the NumPy arrays to a larger contigous block of memory even if memory is available -- imagine the array being sandwiched in between other data structures. In that case, in order to resize the array, a new larger block of memory has to be allocated somewhere else and all the data from the original array has to be copied to the new block. In general, it can't be done in-place.

DataFrames 确实有一个私有方法 _update_inplace ,该方法可以是用于将DataFrame的基础数据重定向到新数据.这只是一个伪插入操作,因为必须将新数据(认为是NumPy数组)为首先分配(并附带所有副本).因此,使用 _update_inplace 有两点反对:它使用了一种私有方法,从理论上讲,它可能不是在未来的Pandas版本中,会招致O(N ^ 2)复制惩罚.

DataFrames do have a private method, _update_inplace, which could be used to redirect a DataFrame's underlying data to new data. This is only a pseudo-inplace operation, since the new data (think NumPy arrays) has to be allocated (with all the attendant copying) first. So using _update_inplace has two strikes against it: it uses a private method which (in theory) may not be around in future versions of Pandas, and it incurs the O(N^2) copying penalty.

In [231]: df = pd.DataFrame([[0,1,2]])

In [232]: df
Out[232]: 
   0  1  2
0  0  1  2

In [233]: df._update_inplace(df.append([[3,4,5]]))

In [234]: df
Out[234]: 
   0  1  2
0  0  1  2
0  3  4  5

这篇关于如果为空则追加到DataFrame的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆