pandas.DataFrame.copy(deep = True)实际上没有创建深层副本 [英] pandas.DataFrame.copy(deep=True) doesn't actually create deep copy
问题描述
我已经尝试了一段时间pd.Series和pd.DataFrame,并且遇到了一些奇怪的问题.假设我有以下pd.DataFrame:
I've been experimenting for a while with pd.Series and pd.DataFrame and faced some strange problem. Let's say I have the following pd.DataFrame:
df = pd.DataFrame({'col':[[1,2,3]]})
请注意,此数据框包括包含列表的列.我想修改此数据框的副本并返回其修改后的版本,以使初始版本保持不变.为了简单起见,假设我要在其单元格中添加整数"4".
Notice, that this dataframe includes column containing list. I want to modify this dataframe's copy and return its modified version so that the initial one will remain unchanged. For the sake of simplicity, let's say I want to add integer '4' in its cell.
我尝试了以下代码:
def modify(df):
dfc = df.copy(deep=True)
dfc['col'].iloc[0].append(4)
return dfc
modify(df)
print(df)
问题在于,除了新的副本 dfc
外,初始DataFrame df
也被修改了.为什么?我应该怎么做才能防止初始数据帧被修改?我的熊猫版本是0.25.0
The problem is that, besides the new copy dfc
, the initial DataFrame df
is also modified. Why? What should I do to prevent initial dataframes from modifying? My pandas version is 0.25.0
推荐答案
From the docs here, in the Notes section:
当deep = True时,将复制数据,但不会递归复制实际的Python对象,而仅是对该对象的引用.这与标准库中的copy.deepcopy相对,后者以递归方式复制对象数据(请参见下面的示例).
When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).
此问题在GitHub上的此问题中再次引用,其中开发人员指出:
This is referenced again in this issue on GitHub, where the devs state that:
将可变对象嵌入a.DataFrame是反模式
embedding mutable objects inside a. DataFrame is an antipattern
因此,此功能按开发人员的意图工作-可变对象(如列表)不应嵌入到DataFrames中.
So this function is working as the devs intend - mutable objects such as lists should not be embedded in DataFrames.
我找不到使 copy.deepcopy
在DataFrame上按预期方式工作的方法,但是我确实找到了使用
I couldn't find a way to get copy.deepcopy
to work as intended on a DataFrame, but I did find a fairly awful workaround using pickle:
import pandas as pd
import pickle
df = pd.DataFrame({'col':[[1,2,3]]})
def modify(df):
dfc = pickle.loads(pickle.dumps(df))
print(dfc['col'].iloc[0] is df['col'].iloc[0]) #Check if we've succeeded in deepcopying
dfc['col'].iloc[0].append(4)
print(dfc)
return dfc
modify(df)
print(df)
输出:
False
col
0 [1, 2, 3, 4]
col
0 [1, 2, 3]
这篇关于pandas.DataFrame.copy(deep = True)实际上没有创建深层副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!