Pandas使用什么规则来生成视图和副本? [英] What rules does Pandas use to generate a view vs a copy?

查看:234
本文介绍了Pandas使用什么规则来生成视图和副本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对于熊猫在决定数据框中的选择是原始数据框的副本或原始数据框的视图时使用的规则感到困惑。



如果我有,例如,

  df = pd.DataFrame(np.random.randn(8,8),columns =列表('ABCDEFGH'),索引= [1,2,3,4,5,6,7,8])

我明白,查询返回副本,以便像

  foo = df.query('2< index< = 5')
foo.loc [:,'E'] = 40
/ pre>

将对原始数据框没有影响, df 。我也明白,标量或命名的切片返回一个视图,所以赋值给这些,例如

  df.iloc [3] = 70 

  df.ix [1,'B':'E'] = 222 

将更改 df 。但是当涉及到更复杂的情况时,我迷失了。例如,

  df [df.C <= df.B] = 7654321 

更改 df ,但

  df [df.C< = df.B] .ix [:,'B':'E'] 

不。



有没有一个简单的规则,熊猫正在使用,我刚刚失踪?这些具体情况发生了什么?特别是如何在满足特定查询的数据框中更改所有值(或值的一个子集)(正如我在上面的最后一个例子中所做的那样)?






注意:这与此问题不同;我已阅读文档,但是我没有开明。我还阅读了关于这个主题的相关问题,但我仍然错过了熊猫正在使用的简单规则,以及我如何应用它,例如 - 修改值(或值的一个子集)在满足特定查询的数据框中。

解决方案

以下是规则,后续覆盖:




  • 所有操作都生成副本


  • 如果 inplace = True 提供,它将在原地修改;只有一些操作支持这个


  • .loc / .ix / .iloc / .iat / .at 将设置inplace。


  • 获取在单个dtyped对象上的索引器几乎总是一个视图(取决于内存布局,这可能不是为什么这不可靠)。这主要是为了效率。 (上面的示例是 .query ;这将总是返回一个副本,由 numexpr


  • 获取多个Dtyped对象的索引器始终为副本。




您的链接索引示例

  df [df.C< = df.B] .ix [:,'B':'E'] 

不能保证工作(因此,您不要坚持)执行此操作。



而是:

  df.ix [df.C< ; = df.B,'B':'E'] 

因为这是更快并且将始终工作



链接索引是2个单独的python操作,因此不能被熊猫可靠地截获(您将经常得到一个 SettingWithCopyWarning ,但也不能100%检测到)。您指出的开发文档提供了更充分的解释。


I'm confused about the rules Pandas uses when deciding that a selection from a dataframe is a copy of the original dataframe, or a view on the original.

If I have, for example,

df = pd.DataFrame(np.random.randn(8,8), columns=list('ABCDEFGH'), index=[1, 2, 3, 4, 5, 6, 7, 8])

I understand that a query returns a copy so that something like

foo = df.query('2 < index <= 5')
foo.loc[:,'E'] = 40

will have no effect on the original dataframe, df. I also understand that scalar or named slices return a view, so that assignments to these, such as

df.iloc[3] = 70

or

df.ix[1,'B':'E'] = 222

will change df. But I'm lost when it comes to more complicated cases. For example,

df[df.C <= df.B]  = 7654321

changes df, but

df[df.C <= df.B].ix[:,'B':'E']

does not.

Is there a simple rule that Pandas is using that I'm just missing? What's going on in these specific cases; and in particular, how do I change all values (or a subset of values) in a dataframe that satisfy a particular query (as I'm attempting to do in the last example above)?


Note: This is not the same as this question; and I have read the documentation, but am not enlightened by it. I've also read through the "Related" questions on this topic, but I'm still missing the simple rule Pandas is using, and how I'd apply it to — for example — modify the values (or a subset of values) in a dataframe that satisfy a particular query.

解决方案

Here's the rules, subsequent override:

  • All operations generate a copy

  • If inplace=True is provided, it will modify in-place; only some operations support this

  • An indexer that sets, e.g. .loc/.ix/.iloc/.iat/.at will set inplace.

  • An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for .query; this will always return a copy as its evaluated by numexpr)

  • An indexer that gets on a multiple-dtyped object is always a copy.

Your example of chained indexing

df[df.C <= df.B].ix[:,'B':'E']

is not guaranteed to work (and thus you shoulld never do this).

Instead do:

df.ix[df.C <= df.B, 'B':'E']

as this is faster and will always work

The chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.

这篇关于Pandas使用什么规则来生成视图和副本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆