Pandas 使用什么规则来生成视图和副本? [英] What rules does Pandas use to generate a view vs a copy?

查看:44
本文介绍了Pandas 使用什么规则来生成视图和副本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Pandas 在决定数据帧中的选择是原始数据帧的副本还是原始数据帧的视图时使用的规则感到困惑.

I'm confused about the rules Pandas uses when deciding that a selection from a dataframe is a copy of the original dataframe, or a view on the original.

如果我有,例如,

df = pd.DataFrame(np.random.randn(8,8), columns=list('ABCDEFGH'), index=range(1,9))

我知道 query 会返回一个副本,因此类似于

I understand that a query returns a copy so that something like

foo = df.query('2 < index <= 5')
foo.loc[:,'E'] = 40

对原始数据帧没有影响,df.我也明白标量或命名切片返回一个视图,以便对这些进行赋值,例如

will have no effect on the original dataframe, df. I also understand that scalar or named slices return a view, so that assignments to these, such as

df.iloc[3] = 70

df.ix[1,'B':'E'] = 222

会改变df.但是当涉及到更复杂的情况时,我就迷失了.例如,

will change df. But I'm lost when it comes to more complicated cases. For example,

df[df.C <= df.B] = 7654321

改变df,但是

df[df.C <= df.B].ix[:,'B':'E']

没有.

是否有 Pandas 正在使用的简单规则我只是遗漏了?在这些特定情况下发生了什么;尤其是,如何更改满足特定查询的数据帧中的所有值(或值的子集)(正如我在上面的最后一个示例中尝试做的那样)?

Is there a simple rule that Pandas is using that I'm just missing? What's going on in these specific cases; and in particular, how do I change all values (or a subset of values) in a dataframe that satisfy a particular query (as I'm attempting to do in the last example above)?

注意:这与这个问题不同;我已经阅读了文档,但我没有被它启发.我也通读了关于这个主题的相关"问题,但我仍然缺少 Pandas 正在使用的简单规则,以及我如何将它应用于 - 例如 - 修改值(或值的子集)在满足特定查询的数据框中.

Note: This is not the same as this question; and I have read the documentation, but am not enlightened by it. I've also read through the "Related" questions on this topic, but I'm still missing the simple rule Pandas is using, and how I'd apply it to — for example — modify the values (or a subset of values) in a dataframe that satisfy a particular query.

推荐答案

这里是规则,后续覆盖:

Here's the rules, subsequent override:

  • 所有操作生成一个副本

  • All operations generate a copy

如果提供了inplace=True,它将就地修改;只有部分操作支持这个

If inplace=True is provided, it will modify in-place; only some operations support this

设置索引器,例如.loc/.iloc/.iat/.at 将就地设置.

An indexer that sets, e.g. .loc/.iloc/.iat/.at will set inplace.

获取单数据类型对象的索引器几乎总是一个视图(取决于内存布局,这可能不是这不可靠的原因).这主要是为了效率.(上面的示例适用于 .query;这将总是返回一个副本,因为它由 numexpr 评估)

An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)

获取多类型对象的索引器始终是副本.

An indexer that gets on a multiple-dtyped object is always a copy.

您的链式索引

df[df.C <= df.B].loc[:,'B':'E']

不能保证工作(因此你应该永远这样做).

is not guaranteed to work (and thus you shoulld never do this).

改为:

df.loc[df.C <= df.B, 'B':'E']

因为它更快并且总是有效

链式索引是 2 个独立的 python 操作,因此不能被 Pandas 可靠地拦截(您经常会得到一个 SettingWithCopyWarning,但这也不是 100% 可检测的).开发文档,其中你指出,提供更完整的解释.

The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.

这篇关于Pandas 使用什么规则来生成视图和副本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆