pandas 数据框和字典的深层副本 [英] Deep copy of Pandas dataframes and dictionaries

查看:99
本文介绍了 pandas 数据框和字典的深层副本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个小的Pandas数据框:

I'm creating a small Pandas dataframe:

df = pd.DataFrame(data={'colA': [["a", "b", "c"]]})

我对该df进行了深拷贝.我没有使用Pandas方法,而是使用了普通的Python,对不对?

I take a deepcopy of that df. I'm not using the Pandas method but general Python, right?

import copy
df_copy = copy.deepcopy(df)

df_copy.head()提供以下内容:

A df_copy.head() gives the following:

然后将这些值放入字典中:

Then I put these values into a dictionary:

mydict = df_copy.to_dict()

那本字典看起来像这样:

That dictionary looks like this:

最后,我从列表中删除了一项:

Finally, I remove one item of the list:

mydict['colA'][0].remove("b")

我很惊讶df_copy中的值已更新.我非常困惑,原始数据框中的值也被更新了!这两个数据框现在都看起来像这样:

I'm surprized that the values in df_copy are updated. I'm very confused that the values in the original dataframe are updated too! Both dataframes look like this now:

我知道Pandas并没有真正进行深度复制,但这不是Pandas的方法.我的问题是:

I understand Pandas doesn't really do deepcopy, but this wasn't a Pandas method. My questions are:

1)如何从不更新数据框的数据框中构建字典?

1) how can I build a dictionary from a dataframe that doesn't update the dataframe?

2)如何获取完全独立的数据框副本?

2) how can I take a copy of a dataframe which would be completely independent?

感谢您的帮助!

干杯, 尼古拉斯

推荐答案

复制包含Python对象的对象时,深层副本将复制数据,但不会递归复制.更新嵌套数据对象将反映在深层副本中. 从DataFrame创建字典时,同样的规则也适用.

When copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy. The same rule works, when you are creating a dictionary from DataFrame.

而copy.deepcopy并不能解决这个问题,因为当将其实际应用到对象上时,它的__dict__会被查找以寻找__deepcopy__方法,该方法又被调用.对于DataFrame实例-__deepcopy__不能递归工作. 要获取完全独立的DataFrame副本-在您的情况下,您可以使用以下命令(注意,不建议这样做-

And copy.deepcopy doesn't solve this problem because what it really does, when applied on an object, its __dict__ is looked up for a __deepcopy__ method, that is called in turn. In the case of a DataFrame instance - __deepcopy__ is not work recursively. To take a copy of DataFrame, which would be completely independent - in your case you may use the following (notice that it's not a recommended practice - putting mutable objects inside a DataFrame is an antipattern):

df_copy = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df.values))

对于字典,您可以使用相同的技巧:

For a dictionary, you may use same trick:

mydict = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df_copy.values)).to_dict()
mydict['colA'][0].remove("b")

也有一些标准的黑客方式来深度复制python对象:

import pickle
df_copy = pickle.loads(pickle.dumps(df))  

希望我已经回答了你的问题.如有需要,请随时要求任何澄清.

Hope I've answered your question. Feel free to ask for any clarifications, if needed.

这篇关于 pandas 数据框和字典的深层副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆