如果为空则追加到DataFrame的问题 [英] Issue with appending to DataFrame if empty
问题描述
我有一个超出本地方法范围初始化的数据框.我想做如下:
I have a data frame that I initialize out of scope of a local method. I would like to do as follows:
def outer_method():
... do outer scope stuff here
df = pd.DataFrame(columns=['A','B','C','D'])
def recursive_method(arg):
... do local stuff here
# func returns a data frame to be appended to empty data frame
results_df = func(args)
df.append(results_df, ignore_index=True)
return results
recursive_method(arg)
return df
但是,这不起作用.如果我以这种方式附加 df
,则它始终为空.
However, this does NOT work. The df
is always empty if I append to it this way.
我在这里找到了解决问题的答案: append-to-an-empty-data-frame-in-pandas ...如果空的DataFrame对象在方法的范围内,则此方法有效,但不适用于我的情况.根据@DSM的评论但附加操作不会就地发生,因此,如果需要,您将必须存储输出:"
I found the answer to my problem here: appending-to-an-empty-data-frame-in-pandas... this works, IF the empty DataFrame object is in scope of the method, but not for my case. As per @DSM's comment "but the append doesn't happen in-place, so you'll have to store the output if you want it:"
爱荷华州,我需要输入以下内容:
IOW, I would need to have something like:
df = df.append(results_df, ignore_index=True)
在我的本地方法中,但这并不能帮助我访问要附加到其外部作用域变量df的内容.
in my local method, but this doesn't help me get access to my outer scope variable df to append to it.
是否有办法做到这一点?这与python extend
方法一起使用可以很好地扩展列表对象的内容(我知道DataFrames不是列表,而是...).有没有一种类似的方法可以对DataFrame对象执行此操作,而不必处理我对 df
的作用域问题?
Is there a way to make this happen in place? This works fine with the python extend
method for extending the contents of a list object (I realize DataFrames are not lists, but...). Is there an analogous way to do this with a DataFrame object without having to deal with my scoping issues for df
?
顺便说一句,Pandas concat
方法也可以使用,但是我遇到了变量作用域的问题.
Btw, the Pandas concat
method also works, but I run into the issue of variable scope.
推荐答案
在Python3中,您可以使用 nonlocal关键字:
In Python3, you could use the nonlocal keyword:
def outer_method():
... do outer scope stuff here
df = pd.DataFrame(columns=['A','B','C','D'])
def recursive_method(arg):
nonlocal df
... do local stuff here
# func returns a data frame to be appended to empty data frame
results_df = func(args)
df = df.append(results_df, ignore_index=True)
return results
return df
但是请注意,每次调用 df.append
都会返回一个新的DataFrame,因此需要将所有旧数据复制到新的DataFrame中.如果您在一个循环内执行此操作N次,则最终将以1 + 2 + 3 + ... + N = O(N ^ 2)的数量进行复制-对性能非常不利.
But note that calling df.append
returns a new DataFrame each time and thus requires copying all the old data into the new DataFrame. If you do this inside a loop N times, you end up making on the order of 1+2+3+...+N = O(N^2) copies -- very bad for performance.
如果您不需要出于任何其他目的在 recursive_method
内的 df
追加时,最好将其追加到列表中,然后构造完成 recursive_method
后,通过DataFrame(通过调用 pd.concat
一次):
If you do not need df
inside recursive_method
for any purpose other than
appending, it is better to append to a list, and then construct the
DataFrame (by calling pd.concat
once) after recursive_method
is done:
df = pd.DataFrame(columns=['A','B','C','D'])
data = [df]
def recursive_method(arg, data):
... do stuff here
# func returns a data frame to be appended to empty data frame
results_df = func(args)
data.append(df_join_out)
return results
recursive_method(arg, data)
df = pd.concat(data, ignore_index=True)
这是最佳解决方案,如果您需要做的就是收集内部数据 recursive_method
,可以在之后等待构造新的 df
recursive_method
完成.
This is the best solution if all you need to do is collect data inside
recursive_method
and can wait to construct the new df
after
recursive_method
is done.
在Python2中,如果必须在 recursive_method
内使用 df
,则可以通过将 df
作为 recursive_method
的参数,并返回 df
:
In Python2, if you must use df
inside recursive_method
, then you could pass
df
as argument to recursive_method
, and return df
too:
df = pd.DataFrame(columns=['A','B','C','D'])
def recursive_method(arg, df):
... do stuff here
results, df = recursive_method(arg, df)
# func returns a data frame to be appended to empty data frame
results_df = func(args)
df = df.append(results_df, ignore_index=True)
return results, df
results, df = recursive_method(arg, df)
,但是请注意,执行O(N ^ 2)复制将付出沉重的代价上面提到过.
but be aware that you will be paying a heavy price doing the O(N^2) copying mentioned above.
为什么不能将DataFrames 附加到原位:
Why DataFrames can not should not be appended to in-place:
DataFrame中的基础数据存储在NumPy数组中.一个中的数据NumPy数组来自一个连续的内存块.有时候没有足够的空间来将NumPy数组调整为更大的连续内存块即使有内存可用-想象一下将数组夹在中间其他数据结构.在这种情况下,为了调整数组的大小,内存块必须分配到其他位置,并且来自原始数组必须复制到新块.一般来说,这是无法完成的就地.
The underlying data in a DataFrame is stored in NumPy arrays. The data in a NumPy array comes from a contiguous block of memory. Sometimes there is not enough space to resize the NumPy arrays to a larger contigous block of memory even if memory is available -- imagine the array being sandwiched in between other data structures. In that case, in order to resize the array, a new larger block of memory has to be allocated somewhere else and all the data from the original array has to be copied to the new block. In general, it can't be done in-place.
DataFrames
确实有一个私有方法 _update_inplace
,该方法可以是用于将DataFrame的基础数据重定向到新数据.这只是一个伪插入操作,因为必须将新数据(认为是NumPy数组)为首先分配(并附带所有副本).因此,使用 _update_inplace
有两点反对:它使用了一种私有方法,从理论上讲,它可能不是在未来的Pandas版本中,会招致O(N ^ 2)复制惩罚.
DataFrames
do have a private method, _update_inplace
, which could be
used to redirect a DataFrame's underlying data to new data. This is only a
pseudo-inplace operation, since the new data (think NumPy arrays) has to be
allocated (with all the attendant copying) first. So using _update_inplace
has
two strikes against it: it uses a private method which (in theory) may not be
around in future versions of Pandas, and it incurs the O(N^2) copying penalty.
In [231]: df = pd.DataFrame([[0,1,2]])
In [232]: df
Out[232]:
0 1 2
0 0 1 2
In [233]: df._update_inplace(df.append([[3,4,5]]))
In [234]: df
Out[234]:
0 1 2
0 0 1 2
0 3 4 5
这篇关于如果为空则追加到DataFrame的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!