在 pandas 中重新分配:复制还是查看? [英] Re-assignment in Pandas: Copy or view?
问题描述
假设我们有以下数据框:
Say we have the following dataframe:
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : randn(8), 'D' : randn(8)})
如下所示:
> df
A B C D
0 foo one 0.846192 0.478651
1 bar one 2.352421 0.141416
2 foo two -1.413699 -0.577435
3 bar three 0.569572 -0.508984
4 foo two -1.384092 0.659098
5 bar two 0.845167 -0.381740
6 foo one 3.355336 -0.791471
7 foo three 0.303303 0.452966
然后我执行以下操作:
df2 = df
df = df[df['C']>0]
如果现在查看df
和df2
,您会看到df2
保存了原始数据,而df
已更新为仅保留C
所在的值大于0.
If you now look at df
and df2
you will see that df2
holds the original data, whereas df
was updated to only keep the values where C
was greater than 0.
我以为Pandas不应像df2 = df
这样的作业来复制,而只能使用以下任何一种进行复制:
I thought Pandas wasn't supposed to make a copy in an assignment like df2 = df
and that it would only make copies with either:
-
df2 = df.copy(deep=True)
-
df2 = copy.deepcopy(df)
df2 = df.copy(deep=True)
df2 = copy.deepcopy(df)
那上面发生了什么? df2 = df
做了副本吗?我认为答案是否,因此肯定是df = df[df['C']>0]
制作了副本,而且我假设,如果我上面没有df2=df
,那肯定会有副本. 没有任何引用,它会在内存中浮动.正确吗?
What happened above then? Did df2 = df
make a copy? I presume that the answer is no, so it must have been df = df[df['C']>0]
that made a copy, and I presume that, if I didn't have df2=df
above, there would have been a copy without any reference to it floating in memory. Is that correct?
注意:我已阅读返回视图而不是副本,我想知道是否存在以下情况:
Note: I read through Returning a view versus a copy and I wonder if the following:
每当索引操作涉及标签数组或布尔向量时,结果将是副本.
解释了此行为.
推荐答案
不是df2
正在制作副本,而是df = df[df['C'] > 0]
正在返回副本.
It's not that df2
is making the copy, it's that the df = df[df['C'] > 0]
is returning a copy.
只需打印出ID,您就会看到:
Just print out the ids and you'll see:
print id(df)
df2 = df
print id(df2)
df = df[df['C'] > 0]
print id(df)
这篇关于在 pandas 中重新分配:复制还是查看?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!