如果为空,则附加到 DataFrame 的问题 [英] Issue with appending to DataFrame if empty

查看:35
本文介绍了如果为空,则附加到 DataFrame 的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在本地方法范围外初始化的数据框.我想这样做:

I have a data frame that I initialize out of scope of a local method. I would like to do as follows:

def outer_method():
    ... do outer scope stuff here
    df = pd.DataFrame(columns=['A','B','C','D'])
    def recursive_method(arg):
        ... do local stuff here
        # func returns a data frame to be appended to empty data frame
        results_df = func(args)
        df.append(results_df, ignore_index=True)
        return results
recursive_method(arg)
return df

然而,这不起作用.如果我以这种方式附加到 df 总是空的.

However, this does NOT work. The df is always empty if I append to it this way.

我在这里找到了问题的答案:appending-to-an-empty-data-frame-in-pandas...这有效,如果空的 DataFrame 对象在方法的范围内,但不适用于我的情况.根据@DSM 的评论但追加不会就地发生,因此如果需要,您必须存储输出:"

I found the answer to my problem here: appending-to-an-empty-data-frame-in-pandas... this works, IF the empty DataFrame object is in scope of the method, but not for my case. As per @DSM's comment "but the append doesn't happen in-place, so you'll have to store the output if you want it:"

IOW,我需要这样的东西:

IOW, I would need to have something like:

df = df.append(results_df, ignore_index=True)

在我的本地方法中,但这并不能帮助我访问我的外部范围变量 df 以附加到它.

in my local method, but this doesn't help me get access to my outer scope variable df to append to it.

有没有办法让这种情况发生?这适用于用于扩展列表对象内容的 python extend 方法(我意识到 DataFrames 不是列表,而是......).是否有类似的方法可以使用 DataFrame 对象执行此操作,而无需处理 df 的范围问题?

Is there a way to make this happen in place? This works fine with the python extend method for extending the contents of a list object (I realize DataFrames are not lists, but...). Is there an analogous way to do this with a DataFrame object without having to deal with my scoping issues for df?

顺便说一句,Pandas concat 方法也有效,但我遇到了变量作用域的问题.

Btw, the Pandas concat method also works, but I run into the issue of variable scope.

推荐答案

在 Python3 中,您可以使用 nonlocal 关键字:

In Python3, you could use the nonlocal keyword:

def outer_method():
    ... do outer scope stuff here
    df = pd.DataFrame(columns=['A','B','C','D'])
    def recursive_method(arg):
        nonlocal df
        ... do local stuff here
        # func returns a data frame to be appended to empty data frame
        results_df = func(args)
        df = df.append(results_df, ignore_index=True)
        return results

return df

但请注意,每次调用 df.append 都会返回一个新的 DataFrame,因此需要将所有旧数据复制到新的 DataFrame 中.如果你在一个循环中这样做 N 次,你最终会制作 1+2+3+...+N = O(N^2) 个副本——对性能非常不利.

But note that calling df.append returns a new DataFrame each time and thus requires copying all the old data into the new DataFrame. If you do this inside a loop N times, you end up making on the order of 1+2+3+...+N = O(N^2) copies -- very bad for performance.

如果您在 recursive_method 中不需要 df 用于除附加,最好附加到一个列表,然后构造recursive_method 完成后的 DataFrame(通过调用 pd.concat 一次):

If you do not need df inside recursive_method for any purpose other than appending, it is better to append to a list, and then construct the DataFrame (by calling pd.concat once) after recursive_method is done:

df = pd.DataFrame(columns=['A','B','C','D'])
data = [df]
def recursive_method(arg, data):
    ... do stuff here
     # func returns a data frame to be appended to empty data frame
     results_df = func(args)
     data.append(df_join_out)
     return results
recursive_method(arg, data)
df = pd.concat(data, ignore_index=True)

如果您只需在内部收集数据,这是最佳解决方案recursive_method 并且可以等待构造新的 df 之后recursive_method 已完成.

This is the best solution if all you need to do is collect data inside recursive_method and can wait to construct the new df after recursive_method is done.

在Python2中,如果你必须在recursive_method中使用df,那么你可以通过df 作为 recursive_method 的参数,也返回 df:

In Python2, if you must use df inside recursive_method, then you could pass df as argument to recursive_method, and return df too:

df = pd.DataFrame(columns=['A','B','C','D'])
def recursive_method(arg, df):
    ... do stuff here
     results, df = recursive_method(arg, df)
     # func returns a data frame to be appended to empty data frame
     results_df = func(args)
     df = df.append(results_df, ignore_index=True)
     return results, df
results, df = recursive_method(arg, df)

但请注意,您将付出沉重的代价进行 O(N^2) 复制上面提到了.

but be aware that you will be paying a heavy price doing the O(N^2) copying mentioned above.

为什么 DataFrames 不能 不应该附加到就地:

Why DataFrames can not should not be appended to in-place:

DataFrame 中的底层数据存储在 NumPy 数组中.数据在一个NumPy 数组来自一个连续的内存块.有时没有足够的空间将 NumPy 数组调整为更大的连续内存块即使内存可用——想象一下数组被夹在中间其他数据结构.在这种情况下,为了调整数组的大小,一个新的更大的内存块必须分配到其他地方,并且所有数据都来自必须将原始数组复制到新块.一般情况下是做不到的就地.

The underlying data in a DataFrame is stored in NumPy arrays. The data in a NumPy array comes from a contiguous block of memory. Sometimes there is not enough space to resize the NumPy arrays to a larger contigous block of memory even if memory is available -- imagine the array being sandwiched in between other data structures. In that case, in order to resize the array, a new larger block of memory has to be allocated somewhere else and all the data from the original array has to be copied to the new block. In general, it can't be done in-place.

DataFrames 确实有一个私有方法,_update_inplace,它可以是用于将 DataFrame 的底层数据重定向到新数据.这只是一个伪就地操作,因为新数据(想想 NumPy 数组)必须是首先分配(所有随从复制).所以使用 _update_inplace针对它的两次打击:它使用了一种(理论上)可能不是的私有方法在 Pandas 的未来版本中,它会导致 O(N^2) 复制惩罚.

DataFrames do have a private method, _update_inplace, which could be used to redirect a DataFrame's underlying data to new data. This is only a pseudo-inplace operation, since the new data (think NumPy arrays) has to be allocated (with all the attendant copying) first. So using _update_inplace has two strikes against it: it uses a private method which (in theory) may not be around in future versions of Pandas, and it incurs the O(N^2) copying penalty.

In [231]: df = pd.DataFrame([[0,1,2]])

In [232]: df
Out[232]: 
   0  1  2
0  0  1  2

In [233]: df._update_inplace(df.append([[3,4,5]]))

In [234]: df
Out[234]: 
   0  1  2
0  0  1  2
0  3  4  5

这篇关于如果为空,则附加到 DataFrame 的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆