在 pandas 中, inplace = True 是否被认为有害? [英] In pandas, is inplace = True considered harmful, or not?

查看:98
本文介绍了在 pandas 中, inplace = True 是否被认为有害?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这之前已经讨论过,但答案相互矛盾:

This has been discussed before, but with conflicting answers:

我想知道的是:

  • 为什么 inplace = False 是默认行为?
  • 什么时候换比较好?(好吧,我可以更改它,所以我想这是有原因的).
  • 这是安全问题吗?也就是说,操作是否会因 inplace = True 而失败/行为异常?
  • 我能否提前知道某个 inplace = True 操作是否会真的"?就地进行?
  • Why is inplace = False the default behavior?
  • When is it good to change it? (well, I'm allowed to change it, so I guess there's a reason).
  • Is this a safety issue? that is, can an operation fail/misbehave due to inplace = True?
  • Can I know in advance if a certain inplace = True operation will "really" be carried out in-place?
  • 许多 Pandas 操作都有一个 inplace 参数,始终默认为 False,这意味着原始 DataFrame 未受影响,并且操作返回一个新的 DF.
  • 当设置inplace = True时,操作可能在原始DF上工作,但它可能仍然在幕后的副本上工作,并且只需重新分配引用时完成.
  • Many Pandas operations have an inplace parameter, always defaulting to False, meaning the original DataFrame is untouched, and the operation returns a new DF.
  • When setting inplace = True, the operation might work on the original DF, but it might still work on a copy behind the scenes, and just reassign the reference when done.
  • 速度更快,内存占用更少(第一个链接显示 reset_index() 运行速度提高两倍,使用峰值内存的一半!).
  • Can be both faster and less memory hogging (the first link shows reset_index() runs twice as fast and uses half the peak memory!).
  • 允许链式/函数式语法:df.dropna().rename().sum()... 这很好,并提供了惰性求值或更有效的重新排序的机会(虽然我认为 Pandas 不会这样做).
  • 在可能是底层 DF 的切片/视图的对象上使用 inplace = True 时,Pandas 必须执行 SettingWithCopy 检查,这很昂贵.inplace = False 避免了这种情况.
  • 始终如一可预测的幕后行为.
  • Allows chained/functional syntax: df.dropna().rename().sum()... which is nice, and offers a chance for lazy evaluation or a more efficient re-ordering (though I don't think Pandas is doing this).
  • When using inplace = True on an object which is potentially a slice/view of an underlying DF, Pandas has to do a SettingWithCopy check, which is expensive. inplace = False avoids this.
  • Consistent & predictable behavior behind the scenes.

因此,将复制与视图问题放在一边,除非专门编写链式语句,否则始终使用 inplace = True 似乎性能更高.但这不是 Pandas 的默认选择,所以我错过了什么?

So, putting the copy-vs-view issue aside, it seems more performant to always use inplace = True, unless specifically writing a chained statement. But that's not the default Pandas opt for, so what am I missing?

推荐答案

在熊猫中,inplace = True 是否有害?

是的,是的.不仅有害.相当有害.此 GitHub 问题 提议弃用 inplace 参数api-wide 在不久的将来某个时候.简而言之,这里是 inplace 参数的所有错误:

Yes, it is. Not just harmful. Quite harmful. This GitHub issue is proposing the inplace argument be deprecated api-wide sometime in the near future. In a nutshell, here's everything wrong with the inplace argument:

  • inplace,顾名思义,通常不会阻止创建副本,并且(几乎)从不提供任何性能优势
  • inplace 不适用于方法链
  • inplace 在 DataFrame 列上调用时会导致可怕的 SettingWithCopyWarning,并且有时可能无法就地更新列
  • inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
  • inplace does not work with method chaining
  • inplace can lead to the dreaded SettingWithCopyWarning when called on a DataFrame column, and may sometimes fail to update the column in-place

以上痛点都是初学者常见的陷阱,去掉这个选项会大大简化API.

The pain points above are all common pitfall for beginners, so removing this option will simplify the API greatly.

我们更深入地了解以上几点.

We take a look at the points above in more depth.

性能
一个常见的误解是使用 inplace=True 将导致更高效或优化的代码.一般来说,使用inplace=True没有性能优势.方法的大多数就地和非就地版本无论如何都会创建数据的副本,就地版本会自动将副本分配回来.副本无法避免.

Performance
It is a common misconception that using inplace=True will lead to more efficient or optimized code. In general, there no performance benefits to using inplace=True. Most in-place and out-of-place versions of a method create a copy of the data anyway, with the in-place version automatically assigning the copy back. The copy cannot be avoided.

方法链
inplace=True阻碍方法链.对比

result = df.some_function1().reset_index().some_function2()

相对于

temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()

意外陷阱
要记住的最后一个警告是调用 inplace=True 可以触发 SettingWithCopyWarning:

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame

这可能会导致意外行为.

Which can cause unexpected behavior.

这篇关于在 pandas 中, inplace = True 是否被认为有害?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆