在复制之后但在编辑副本之前编辑原始DataFrame会更改副本 [英] Editing Original DataFrame After Making a Copy but Before Editing the Copy Changes the Copy

查看:111
本文介绍了在复制之后但在编辑副本之前编辑原始DataFrame会更改副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解如何复制熊猫数据框.当我在python中分配对象的副本时,我不习惯更改对原始对象的更改而影响该对象的副本.例如:

I am trying to understand how copying a pandas data frame works. When I assign a copy of an object in python I am not used to changes to the original object affecting copies of that object. For example:

x = 3
y = x
x = 4
print(y)
3

虽然随后更改了x,但y保持不变.相反,当我将熊猫df分配给副本df1后对其进行更改时,副本也会受到原始DataFrame更改的影响.

While x has subsequently been changed, y remains the same. In contrast, when I make changes to a pandas df after assigning it to a copy df1 the copy is also affected by changes to the original DataFrame.

import pandas as pd
import numpy as np

def minusone(x):
    return int(x) - 1

df = pd.DataFrame({"A": [10,20,30,40,50], "B": [20, 30, 10, 40, 50], "C": [32, 234, 23, 23, 42523]})

df1 = df


print(df1['A'])

0    10
1    20
2    30
3    40
4    50
Name: A, dtype: int64

df['A'] = np.vectorize(minusone)(df['A'])

print(df1['A'])

0     9
1    19
2    29
3    39
4    49
Name: A, dtype: int64

解决方案似乎正在使用copy.deepcopy()进行深层复制,但是由于此行为与我在python中习惯的行为不同,我想知道是否有人可以解释这种差异背后的原因是,还是一个错误.

The solution appears to be making a deep copy with copy.deepcopy(), but because this behavior is different from the behavior I am used to in python I was wondering if someone could explain what the reasoning behind this difference is or if it is a bug.

推荐答案

在第一个示例中,您没有更改x的值.您为x分配了一个 new 值.

In your first example, you did not make a change to the value of x. You assigned a new value to x.

在第二个示例中,您确实通过更改df的列之一来修改了它的值.

In your second example, you did modify the value of df, by changing one of its columns.

您也可以看到内置类型的效果:

You can see the effect with builtin types too:

>>> x = []
>>> y = x
>>> x.append(1)
>>> y
[1]

该行为并非只针对熊猫;这是Python的基础.这个站点上有很多关于同一问题的问题,都是由相同的误解引起的.语法

The behavior is not specific to Pandas; it is fundamental to Python. There are many, many questions on this site about this same issue, all stemming from the same misunderstanding. The syntax

barename = value

与Python中的任何其他构造都不具有相同的行为.

使用name[key] = valuename.attr = valuename.methodcall()时,您可能会变异name所引用的对象的值,可能正在复制某些内容,等等.通过使用name = value(其中name是一个标识符,没有点,没有括号等),您永远不会突变任何东西,也永远不会复制任何东西.

When using name[key] = value, or name.attr = value or name.methodcall(), you may be mutating the value of the object referred to by name, you may be copying something, etc. By using name = value (where name is a single identifier, no dots, no brackets, etc.), you never mutate anything, and never copy anything.

在第一个示例中,您使用了语法x = ....在第二个示例中,您使用了语法df['A'] = ....这些语法不相同,因此您不能假定它们具有相同的行为.

In your first example, you used the syntax x = .... In your second example, you used the syntax df['A'] = .... These are not the same syntax, so you can't assume they have the same behavior.

进行复制的方式取决于您要复制的对象的类型.对于您的情况,请使用df1 = df.copy().

The way to make a copy depends on the kind of object you're trying to copy. For your case, use df1 = df.copy().

这篇关于在复制之后但在编辑副本之前编辑原始DataFrame会更改副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆