Pandas 操作 DataFrame 就地与非就地(就地 = True vs False) [英] Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False)

查看:65
本文介绍了Pandas 操作 DataFrame 就地与非就地(就地 = True vs False)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道当我们选择就地操作数据帧(与非就地相比)时,内存使用量是否会显着减少.

I'm wondering if there's a significant reduction in memory usage when we choose to manipulate a dataframe in-place (compared to not in-place).

我在 Stack Overflow 上做了一些搜索,发现了这篇帖子 答案指出,如果操作没有就地完成,则返回数据帧的副本(我想当有一个名为就地"的可选参数时,这有点明显:P).

I've done a bit of searching on Stack Overflow and came across this post where the answer states that if an operation is not done in-place, a copy of the dataframe is returned (I guess that's a bit obvious when there's an optional parameter called 'inplace' :P).

如果我不需要保留原始数据框,那么只修改数据框是有益的(并且合乎逻辑的),对吗?

If I don't need to keep the original dataframe around, it would be beneficial (and logical) to just modify the dataframe in place right?

上下文:

当按数据框中的特定列"排序时,我试图获取顶部元素.我想知道这两个中哪个更有效:

I'm trying to get the top element when sorted by a particular 'column' in the dataframe. I was wondering which of these two is more efficient:

就地:

df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]

对比

复制:

top = df.sort('some_column', ascending=0).iloc[0]

对于复制"情况,即使我没有将副本分配给变量,它仍然会在排序时分配内存以进行复制,对吗?如果是这样,从内存中释放该副本需要多长时间?

For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?

感谢您提前提供任何见解!

Thanks for any insights in advance!

推荐答案

一般来说,inplace=True 和返回显式副本没有区别 - 两者情况下,会创建一个副本.碰巧的是,在第一种情况下,副本中的数据被复制回原始 df 对象,因此不需要重新分配.

In general, there is no difference between inplace=True and returning an explicit copy - in both cases, a copy is created. It just so happens that, in the first case, the data in the copy is copied back into the original df object, so reassignment is not necessary.

此外,请注意,从 v0.21 开始,df.sort 已弃用,请使用 sort_values 代替.

Furthermore, note that as of v0.21, df.sort is deprecated, use sort_values instead.

这篇关于Pandas 操作 DataFrame 就地与非就地(就地 = True vs False)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆