Pandas DataFrame 可变性 [英] Pandas DataFrame mutability
问题描述
我对 Panda 的 Dataframe 还很陌生,如果有人能通过以下示例向我简要讨论 DataFrame 的可变性,我将不胜感激:
I am pretty new to Panda's Dataframe and it would be highly appreciated if someone can briefly discuss about the mutability of DataFrame to me with the following example:
d1=pd.date_range('1/1/2016',periods=10,freq='w')
col1=['open','high','low','close']
list1=np.random.rand(10,4)
df1=pd.DataFrame(list1,d1,col1)
据我所知,目前 df1 是对 df 对象的引用.
To my understanding, currently df1 is a reference to a df object.
如果我传递 df1 或 df1 的切片(例如 df1.iloc[2:3,1:2]
)作为新 df 的输入,(例如 df2=pd.DataFrame(df1)
),df2 是返回一个新的数据帧实例还是仍然引用 df1 使 df1 暴露给 df2?
If I pass df1 or slicing of df1 (e.g. df1.iloc[2:3,1:2]
) as an input to a new df, (e.g. df2=pd.DataFrame(df1)
), does df2 return a new instance of dataframe or it is still referring to df1 that makes df1 exposed to df2?
关于 DataFrame 的可变性我应该注意的任何其他点也将不胜感激.
Also any other point that I should pay attention to regarding mutability of DataFrame will be very much appreciated.
推荐答案
这个:
df2 = pd.DataFrame(df1)
构造一个新的 DataFrame.有一个 copy
参数,它的默认参数是 False
.根据文档,这意味着:
Constructs a new DataFrame. There is a copy
parameter whose default argument is False
. According to the documentation, it means:
> Copy data from inputs. Only affects DataFrame / 2d ndarray input
因此默认情况下,数据将在 df2
和 df1
之间共享.如果您不希望共享,而是想要完整的副本,请执行以下操作:
So data will be shared between df2
and df1
by default. If you want there to be no sharing, but rather a complete copy, do this:
df2 = pd.DataFrame(df1, copy=True)
或者更简洁和地道:
df2 = df1.copy()
如果你这样做:
df2 = df1.iloc[2:3,1:2].copy()
您将再次获得独立副本.但是如果你这样做:
You will again get an independent copy. But if you do this:
df2 = pd.DataFrame(df1.iloc[2:3,1:2])
它可能会共享数据,但是如果您打算修改df
,这种样式非常不清楚,因此我建议不要编写此类代码.相反,如果你不想复制,就这样说:
It will probably share the data, but this style is pretty unclear if you intend to modify df
, so I suggest not writing such code. Instead, if you want no copy, just say this:
df2 = df1.iloc[2:3,1:2]
总而言之:如果您想引用现有数据,请不要调用 pd.DataFrame()
或任何其他方法.如果你想要一个独立的副本,调用 .copy()
.
In summary: if you want a reference to existing data, do not call pd.DataFrame()
or any other method at all. If you want an independent copy, call .copy()
.
这篇关于Pandas DataFrame 可变性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!