pandas 数据框和字典的深层副本 [英] Deep copy of Pandas dataframes and dictionaries
问题描述
我正在创建一个小的Pandas数据框:
I'm creating a small Pandas dataframe:
df = pd.DataFrame(data={'colA': [["a", "b", "c"]]})
我对该df进行了深拷贝.我没有使用Pandas方法,而是使用了普通的Python,对不对?
I take a deepcopy of that df. I'm not using the Pandas method but general Python, right?
import copy
df_copy = copy.deepcopy(df)
df_copy.head()提供以下内容:
A df_copy.head() gives the following:
然后将这些值放入字典中:
Then I put these values into a dictionary:
mydict = df_copy.to_dict()
那本字典看起来像这样:
That dictionary looks like this:
最后,我从列表中删除了一项:
Finally, I remove one item of the list:
mydict['colA'][0].remove("b")
我很惊讶df_copy中的值已更新.我非常困惑,原始数据框中的值也被更新了!这两个数据框现在都看起来像这样:
I'm surprized that the values in df_copy are updated. I'm very confused that the values in the original dataframe are updated too! Both dataframes look like this now:
我知道Pandas并没有真正进行深度复制,但这不是Pandas的方法.我的问题是:
I understand Pandas doesn't really do deepcopy, but this wasn't a Pandas method. My questions are:
1)如何从不更新数据框的数据框中构建字典?
1) how can I build a dictionary from a dataframe that doesn't update the dataframe?
2)如何获取完全独立的数据框副本?
2) how can I take a copy of a dataframe which would be completely independent?
感谢您的帮助!
干杯, 尼古拉斯
推荐答案
复制包含Python对象的对象时,深层副本将复制数据,但不会递归复制.更新嵌套数据对象将反映在深层副本中. 从DataFrame创建字典时,同样的规则也适用.
When copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy. The same rule works, when you are creating a dictionary from DataFrame.
而copy.deepcopy并不能解决这个问题,因为当将其实际应用到对象上时,它的__dict__会被查找以寻找__deepcopy__方法,该方法又被调用.对于DataFrame实例-__deepcopy__不能递归工作. 要获取完全独立的DataFrame副本-在您的情况下,您可以使用以下命令(注意,不建议这样做-
And copy.deepcopy doesn't solve this problem because what it really does, when applied on an object, its __dict__ is looked up for a __deepcopy__ method, that is called in turn. In the case of a DataFrame instance - __deepcopy__ is not work recursively. To take a copy of DataFrame, which would be completely independent - in your case you may use the following (notice that it's not a recommended practice - putting mutable objects inside a DataFrame is an antipattern):
df_copy = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df.values))
对于字典,您可以使用相同的技巧:
For a dictionary, you may use same trick:
mydict = pd.DataFrame(columns = df.columns, data = copy.deepcopy(df_copy.values)).to_dict()
mydict['colA'][0].remove("b")
import pickle
df_copy = pickle.loads(pickle.dumps(df))
希望我已经回答了你的问题.如有需要,请随时要求任何澄清.
Hope I've answered your question. Feel free to ask for any clarifications, if needed.
这篇关于 pandas 数据框和字典的深层副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!