pandas :使用groupby的操作yield SettingWithCopyWarning [英] pandas: operations using groupby yield SettingWithCopyWarning

查看:66
本文介绍了 pandas :使用groupby的操作yield SettingWithCopyWarning的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下熊猫DataFrame:

Let's say I have the following pandas DataFrame:

df = pd.DataFrame({
    'team': ['Warriors', 'Warriors', 'Warriors', 'Rockets', 'Rockets'],
    'player': ['Stephen Curry', 'Klay Thompson', 'Kevin Durant', 'Chris Paul', 'James Harden']})

当我尝试对team列进行分组并执行操作时,我得到一个SettingWithCopyWarning:

When I try to group on the team column and perform an operation I get a SettingWithCopyWarning:

for team, team_df in df.groupby(by='team'):
    # team_df = team_df.copy()  # produces no warning
    team_df['rank'] = 10  # produces warning
    team_df.loc[:, 'rank'] = 10  # produces warning

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
df_team['rank'] = 10

如果我取消注释生成sub-DataFrame副本的行,则不会收到错误.这是避免此警告的一般最佳做法,还是我做错了什么?

If I uncomment the line generating a copy of the sub-DataFrame, I don't get the error. Is this generally best practice to avoid this warning or am I doing something wrong?

请注意,我不想编辑原始的DataFrame df.另外,我知道可以通过一个更好的方法来完成此示例,但是我的用例要复杂得多,并且需要对原始DataFrame进行分组,并根据不同的DataFrame和该唯一组的规范执行一系列操作.

Note I don't want to edit the original DataFrame df. Also I know this example can be done a better way but my use case is much more complex and requires grouping an original DataFrame and performing a series of operations based on a different DataFrame and the specs of that unique group.

推荐答案

一旦您浏览了本文和是 确信您知道如何避免链接索引(通过使用.lociloc),则可以使用以下命令关闭SettingWithCopyWarning pd.options.mode.chained_assignment = None,再也不会被这个警告所困扰.

Once you grok this article and are confident you know how to avoid chained indexing (through use of .loc or iloc) then you can turn off the SettingWithCopyWarning with pd.options.mode.chained_assignment = None and never be bothered by this warning ever again.

自从您写信

请注意,我不想编辑原始的DataFrame df

Note I don't want to edit the original DataFrame df

,并且您已正确使用.loc分配给team_df,很显然您 已经知道修改副本(team_df)不会修改原始副本 (df),所以这里发出的SettingWithCopyWarning只是个麻烦.

and you are properly using .loc to assign to team_df, it is clear you already know that modifying the copy (team_df) will not modify the original (df), so the SettingWithCopyWarning emitted here is just a nuisance.

SettingWithCopyWarning在各种情况下都会出现 即使使用.loc.iloc也可以正确编码.没有适当"的编码方式 这样可以避免有时触发SettingWithCopyWarning s.

The SettingWithCopyWarning comes up in all sorts of situations where you are coding properly, even with .loc or .iloc. There is no "proper" way to code which avoids sometimes triggering SettingWithCopyWarnings.

因此,我将通过以下方式全局关闭此警告

Therefore, I would just turn off this warning globally with

pd.options.mode.chained_assignment = None


通常我不建议您使用team_df = team_df.copy()只是为了避免 SettingWithCopyWarning s-复制数据帧可能会费劲 性能,尤其是在数据帧很大或循环执行多次的情况下.


I would generally not recommend using team_df = team_df.copy() just to avoid SettingWithCopyWarnings -- copying a dataframe can be a drain on performance especially when the dataframe is large or if done many times in a loop.

如果您想仅在一次关闭警告 位置,您可以使用

If you want to turn off the warning in just one location, you could use

team_df.is_copy = False

它具有相同的目的,但不会降低性能.请注意, 熊猫官方API中未提及is_copy,因此可能不会 保证在以后的所有版本中都存在或对此有用 熊猫因此,如果健壮性是优先考虑的因素,但性能不是重点,则可以使用 team_df = team_df.copy().但我认为,对于经验丰富的人来说,更合理的方法 熊猫程序员可以选择在全球范围内关闭警告,或者-如果您 想要非常小心-保留警告,手动进行检查,但接受 有时它会被正确的代码触发.

It serves the same purpose but will not be a performance drain. Note, however, that is_copy is not mentioned in the official Pandas API, so it may not be guaranteed to exist or be useful for this purpose in all future versions of Pandas. So if robustness is a priority but performance isn't then maybe use team_df = team_df.copy(). But I think the sounder way for an experienced Pandas programmer to go is to either turn the warning off globally or -- if you want to be very careful -- keep the warnings, check them manually, but accept that it will sometimes be triggered by correct code.

这篇关于 pandas :使用groupby的操作yield SettingWithCopyWarning的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆