pandas -是否在位= True是否被视为有害? [英] Pandas - is inplace = True considered harmful or not?

查看:80
本文介绍了 pandas -是否在位= True是否被视为有害?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

之前已经讨论过,但是答案有冲突:

This has been discussed before, but with conflicting answers:

  • in-place is good!
  • in-place is bad!

我想知道的是:

  • 为什么inplace = False是默认行为?
  • 何时进行更改比较合适? (嗯,我可以更改它,所以我想是有原因的.)
  • 这是安全问题吗?也就是说,操作是否会因inplace = True而失败/不当?
  • 我可以事先知道某个inplace = True操作是否会真正"就地执行吗?
  • Why is inplace = False the default behavior?
  • When is it good to change it? (well, I'm allowed to change it, so I guess there's a reason).
  • Is this a safety issue? that is, can an operation fail/misbehave due to inplace = True?
  • Can I know in advance if a certain inplace = True operation will "really" be carried out in-place?
  • 许多Pandas操作都有一个inplace参数,该参数始终默认为False,这意味着原始DataFrame未被修改,并且该操作返回一个新的DF.
  • 设置inplace = True时,操作可能可以在原始DF上执行,但是它仍然可以在幕后进行复制,只需在完成操作后重新分配参考即可.
  • Many Pandas operations have an inplace parameter, always defaulting to False, meaning the original DataFrame is untouched, and the operation returns a new DF.
  • When setting inplace = True, the operation might work on the original DF, but it might still work on a copy behind the scenes, and just reassign the reference when done.
  • 允许使用链式/函数式语法:df.dropna().rename().sum()...这很不错,并且为延迟评估或更有效的重新排序提供了机会(尽管我不认为Pandas会这样做).
  • 在可能是基础DF切片/视图的对象上使用inplace = True时,Pandas必须进行SettingWithCopy检查,这很昂贵. inplace = False避免这种情况.
  • 一致的&可预测的幕后行为.
  • Allows chained/functional syntax: df.dropna().rename().sum()... which is nice, and offers a chance for lazy evaluation or a more efficient re-ordering (though I don't think Pandas is doing this).
  • When using inplace = True on an object which is potentially a slice/view of an underlying DF, Pandas has to do a SettingWithCopy check, which is expensive. inplace = False avoids this.
  • Consistent & predictable behavior behind the scenes.
  • 可以更快,也可以减少内存占用(第一个链接显示reset_index()运行速度快两倍,并且使用峰值内存的一半!).
  • Can be both faster and less memory hogging (the first link shows reset_index() runs twice as fast and uses half the peak memory!).

因此,撇开copy-vs-view问题,似乎总是使用inplace = True更具性能,除非专门编写链式语句.但这不是熊猫默认的选择,那我想念的是什么?

So, putting the copy-vs-view issue aside, it seems more performant to always use inplace = True, unless specifically writing a chained statement. But that's not the default Pandas opt for, so what am I missing?

推荐答案

如果默认值为inplace,则将对当前引用该名称的所有名称更改DataFrame.

If inplace was the default then the DataFrame would be mutated for all names that currently reference it.

一个简单的例子,说我有一个df:

A simple example, say I have a df:

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

现在,DataFrame保留该行顺序非常重要-假设它来自数据源,例如插入顺序是关键.

Now it's very important that DataFrame retains that row order - let's say it's from a data source where insertion order is key for instance.

但是,我现在需要执行一些需要不同排序顺序的操作:

However, I now need to do some operations which require a different sort order:

def f(frame):
    df = frame.sort_values('a')
    # if we did frame.sort_values('a', inplace=True) here without
    # making it explicit - our caller is going to wonder what happened
    # do something
    return df

很好-我的原始df保持不变.但是,如果inplace=True是默认设置,那么我的原始df现在将作为f()的副作用进行排序,在该副作用中,我将不得不信任调用方以记住不要在原地做某事我不期望而是故意在适当的地方做一些事情 ...因此,最好是将可以在适当的位置对对象进行变异的任何事情至少明确地这样做 让事情变得更加明显,为什么.

That's fine - my original df remains the same. However, if inplace=True were the default then my original df will now be sorted as a side-effect of f() in which I'd have to trust the caller to remember to not do something in place I'm not expecting instead of deliberately doing something in place... So it's better that anything that can mutate an object in place does so explicitly to at least make it more obvious what's happened and why.

即使使用基本的Python内置可变变量,您也可以观察到以下情况:

Even with basic Python builtin mutables, you can observe this:

data = [3, 2, 1]

def f(lst):
    lst.sort()
    # I meant lst = sorted(lst)
    for item in lst:
        print(item)

f(data)

for item in data:
    print(item)

# huh!? What happened to my data - why's it not 3, 2, 1?     

这篇关于 pandas -是否在位= True是否被视为有害?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆