DataFrame复制的奇怪行为 [英] Strange behavior with DataFrame copy
问题描述
考虑以下代码:
In [16]: data = [['Alex',10],['Bob',12],['Clarke',13]]
In [17]: df = pd.DataFrame(data,columns=['Name','Age'])
Out[18]:
Name Age
0 Alex 10
1 Bob 12
2 Clarke 13
In [19]: df_new = df
In [20]: df_new['Age'] = df_new['Age'] * 90 / 100
In [21]: df_new
Name Age
0 Alex 9.0
1 Bob 10.8
2 Clarke 11.7
In [22]: df
Name Age
0 Alex 9.0
1 Bob 10.8
2 Clarke 11.7
当我为新的 Age 列分配新值时DataFrame( df_new ),原始DataFrame( df )的 Age 列也已更改。
When I assigned new values to the Age columns of the new DataFrame (df_new), the Age column of the original DataFrame (df) changed as well.
为什么会发生?它与创建原始DataFrame副本的方式有关吗?似乎它们被链接在一起。
Why does it happen? Does it have something to do with the way I create a copy of the original DataFrame? Seem like they are chained together.
推荐答案
使用-
df_new = df.copy()
OR
df_new = df.copy(deep=True)
这是复制 pandas
对象的索引和数据的标准方法。
This is the standard way of making a copy of a pandas
object’s indices and data.
从 pandas文档
当deep = True(默认)时,将使用
的副本,即调用对象的数据和索引。副本的数据或
索引的修改不会反映在原始对象中
When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object
说明
如果看到创建的各种DataFrame的对象ID,则可以清楚地看到正在发生的事情。
If you see the object IDs of the various DataFrames you create, you can clearly see what is happening.
当编写df_new = df时,您将创建一个名为 new_df
的变量,并将其绑定到与 df具有相同ID的对象
。
When you write df_new = df, you are creating a variable named new_df
, and binding it with an object with same id as that of df
.
示例
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
df_new = df
df_copy = df.copy()
print("ID of old df: {}".format(id(df)))
print("ID of new df: {}".format(id(df_new)))
print("ID of copy df: {}".format(id(df_copy)))
输出
ID of old df: 113414664
ID of new df: 113414664
ID of copy df: 113414832
这篇关于DataFrame复制的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!