Python视图vs复制错误希望我仅在脚本中使用.loc [英] Python view vs copy error wants me to use .loc in script only
问题描述
我正在运行一个长脚本,该脚本的数据帧为df
.在脚本运行时,逐列建立和修改df
,我在命令行中一遍又一遍地得到此错误:
I'm running a long script which has a dataframe df
. as the script runs, building up and modifying df
column by column I get this error over and over again in the command line:
试图在DataFrame的切片副本上设置一个值.尝试 使用.loc [row_indexer,col_indexer] = value代替请参见 文档: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
但是随后它将打印出引起警告的行,并且看起来不会像问题一样.如下所示的行将触发它(每行分别触发它):
But then it will print out the line that is causing the warning and it wont look like a problem. Lines such as the following will trigger it (each line triggered it separately):
df['ZIP_DENS'] = df['ZIP_DENS'].astype(str)
df['AVG_WAGE'] = df['AVG_WAGE'].astype(str).apply(lambda x:x if x != 'nan' else 'unknown')
df['TERM_BIN'] = df['TERMS'].map(terms_dict)
df['LOSS_ONE'] = 'T_'+ df['TERM'].astype(str) +'_C_'+ df['COMP'].astype(str) + df['SIZE']
# this one's inside a loop:
df[i + '_BIN'] = df[i + '_BIN'].apply(lambda x:x if x != 'nan' else 'unknown')
有一些我在数据框上进行突变的示例.现在,此警告刚刚开始出现,但我无法在解释器中重新创建此问题.当我打开终端机时,我会尝试类似的操作,但它不会给我任何警告:
There are some examples of the mutations I'm making on the dataframe. Now, this warning just started showing up but I can't recreate this problem in the interpreter. When I open a terminal I try things like this and it gives me no warnings:
import pandas as pd
df = pd.DataFrame([list('ab'),list('ef')],columns=['first','second'])
df['third'] = df[['first','second']].astype('str')
我是否缺少某些东西,对于这个警告试图告诉我的关于DataFrames的性质我不了解的东西?您是否认为我可能在脚本开始时对此数据帧进行了某些操作,然后该对象上的所有后续突变都是视图或视图副本的突变,或者类似的事情正在发生?
Is there something I'm missing, something I don't understand about the nature of DataFrames that this warning is trying to tell me? Do you think perhaps I did something to this dataframe at the beginning of the script and then all subsequent mutations on the object are mutations on a view or a copy of it or something weird like that is going on?
推荐答案
正如我在评论中提到的那样,可能的问题是,在代码上游的某个位置,您为df
分配了一些其他pd.DataFrame
的一部分.
这是造成混乱的常见原因,并且在为什么在使用链式索引时分配失败.
As I mentioned in my comment, the likely issue is that somewhere upstream in your code, you assigned a slice of some other pd.DataFrame
to df
.
This is a common cause of confusion and is also explained under why-does-assignment-fail-when-using-chained-indexing in the link that the Warning
mentions.
一个最小的例子:
data = pd.DataFrame({'a':range(7), 'b':list('abcccdb')})
df = data[data.a % 2 == 0] #making a subselection of the DataFrame
df['b'] = 'b'
/home/user/miniconda3/envs/myenv/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:试图在一个副本上设置一个值 从DataFrame切片.尝试使用.loc [row_indexer,col_indexer] = 值代替
/home/user/miniconda3/envs/myenv/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
请参阅文档中的警告: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy "启动IPython内核的入口点.
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """Entry point for launching an IPython kernel.
请注意本节:
df = data[data.a % 2 == 0] #making a subselection of the DataFrame
df['b'] = 'b'
也可以这样重写:
data[data.a % 2 == 0]['b'] = 'b' #obvious chained indexing
df = data[data.a % 2 == 0]
写此位的正确方法如下:
The correct way of writing this bit is the following way:
data = pd.DataFrame({'a':range(7), 'b':list('abcccdb')})
df = data.loc[data.a % 2 == 0].copy() #making a copy of the subselection
df.loc[:,'b'] = 'b'
这篇关于Python视图vs复制错误希望我仅在脚本中使用.loc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!