困惑关于: pandas 的数据帧警告切片副本 [英] Confusion re: pandas copy of slice of dataframe warning
问题描述
我已经浏览了与该问题相关的一堆问题和答案,但是我仍然发现我在意想不到的地方得到了切片警告的副本.另外,它的代码在以前对我来说运行良好,这使我想知道某种更新是否可能是罪魁祸首.
I've looked through a bunch of questions and answers related to this issue, but I'm still finding that I'm getting this copy of slice warning in places where I don't expect it. Also, it's cropping up in code that was running fine for me previously, leading me to wonder if some sort of update may be the culprit.
例如,这是一组代码,其中我要做的就是将Excel文件读入熊猫DataFrame
,并减少df[[]]
语法中包含的列集.
For example, this is a set of code where all I'm doing is reading in an Excel file into a pandas DataFrame
, and cutting down the set of columns included with the df[[]]
syntax.
izmir = pd.read_excel(filepath)
izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60','MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5','MC_OLD_18>F>5',
'MC_OLD_M_Child<5','MC_OLD_F_Child<5','MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery','Date to insert','Date of Entery']]
现在,我对此izmir_lim
文件进行的任何进一步更改都会引发切片警告的副本.
Now, any further changes I make to this izmir_lim
file raise the copy of slice warning.
izmir_lim['Age'] = izmir_lim.Age.fillna(0)
izmir_lim['Age'] = izmir_lim.Age.astype(int)
/Users/samlilienfeld/anaconda/lib/python3.5/site-packages/ipykernel/主要 .py:2: SettingWithCopyWarning:试图在一个副本上设置一个值 从DataFrame切片.尝试使用.loc [row_indexer,col_indexer] = 值代替
/Users/samlilienfeld/anaconda/lib/python3.5/site-packages/ipykernel/main.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
我很困惑,因为我认为df[[]]
列子设置默认会返回副本.我发现抑制错误的唯一方法是显式添加df[[]].copy()
.我本可以发誓过去我不必这样做,也不会提出切片错误的副本.
I'm confused because I thought the df[[]]
column subsetting returned a copy by default. The only way I've found to suppress the errors is by explicitly adding df[[]].copy()
. I could have sworn that in the past I did not have to do that and did not raise the copy of slice error.
类似地,我还有一些其他代码,它们在数据帧上运行一个函数,以某些方式对其进行过滤:
Similarly, I have some other code that runs a function on a dataframe to filter it in certain ways:
def lim(df):
if (geography == "All"):
df_geo = df
else:
df_geo = df[df.center_JO == geography]
df_date = df_geo[(df_geo.date_survey >= start_date) & (df_geo.date_survey <= end_date)]
return df_date
df_lim = lim(df)
从现在开始,我对df_lim
的任何值进行的任何更改都会导致切片错误的复制.我发现的唯一解决方法是将函数调用更改为:
From this point forward, any changes I make to any of the values of df_lim
raise the copy of slice error. The only way around it that i've found is to change the function call to:
df_lim = lim(df).copy()
这对我来说似乎是错误的.我想念什么?看来这些用例默认情况下应该返回副本,而且我可能发誓说我上次运行这些脚本时,我并没有遇到这些错误.
我是否只需要在所有位置开始添加.copy()
?似乎应该有一种更清洁的方法来执行此操作.非常感谢您提供任何见识或帮助.
This just seems wrong to me. What am I missing? It seems like these use cases should return copies by default, and I could have sworn that the last time I ran these scripts I was not running in to these errors.
Do I just need to start adding .copy()
all over the place? Seems like there should be a cleaner way to do this. Any insight or help is much appreciated.
推荐答案
izmir = pd.read_excel(filepath)
izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60',
'MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5',
'MC_OLD_18>F>5','MC_OLD_M_Child<5','MC_OLD_F_Child<5',
'MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery',
'Date to insert','Date of Entery']]
izmir_lim
是izmir
的视图/副本.您随后尝试分配给它.这就是引发错误的原因.改用它:
izmir_lim
is a view/copy of izmir
. You subsequently attempt to assign to it. This is what is throwing the error. Use this instead:
izmir_lim = izmir[['Gender','Age','MC_OLD_M>=60','MC_OLD_F>=60',
'MC_OLD_M>18','MC_OLD_F>18','MC_OLD_18>M>5',
'MC_OLD_18>F>5','MC_OLD_M_Child<5','MC_OLD_F_Child<5',
'MC_OLD_M>0<=1','MC_OLD_F>0<=1','Date to Delivery',
'Date to insert','Date of Entery']].copy()
每当您以以下方式从另一个创建"新数据框时:
Whenever you 'create' a new dataframe from another in the following fashion:
new_df = old_df[list_of_columns_names]
new_df
的is_copy
属性将具有真实值.当您尝试分配给它时,熊猫会抛出SettingWithCopyWarning
.
new_df
will have a truthy value in it's is_copy
attribute. When you attempt to assign to it, pandas throws the SettingWithCopyWarning
.
new_df.iloc[0, 0] = 1 # Should throw an error
您可以通过多种方式克服这一点.
You can overcome this in several ways.
new_df = old_df[list_of_columns_names].copy()
选项2(如@ayhan在评论中建议的那样)
new_df = old_df[list_of_columns_names]
new_df.is_copy = None
选项#3
new_df = old_df.loc[:, list_of_columns_names]
这篇关于困惑关于: pandas 的数据帧警告切片副本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!