Pandas:链式赋值 [英] Pandas: Chained assignments

查看:73
本文介绍了Pandas:链式赋值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读这个链接 关于返回视图与副本".我真的不明白 Pandas 中 chained assignment 概念是如何工作的,以及如何使用 .ix().iloc().loc() 影响它.

I have been reading this link on "Returning a view versus a copy". I do not really get how the chained assignment concept in Pandas works and how the usage of .ix(), .iloc(), or .loc() affects it.

我收到以下代码行的 SettingWithCopyWarning 警告,其中 data 是 Panda 数据框,amount 是一列(系列)该数据框中的名称:

I get the SettingWithCopyWarning warnings for the following lines of codes, where data is a Panda dataframe and amount is a column (Series) name in that dataframe:

data['amount'] = data['amount'].astype(float)

data["amount"].fillna(data.groupby("num")["amount"].transform("mean"), inplace=True)

data["amount"].fillna(mean_avg, inplace=True)

看看这段代码,是不是很明显我在做一些次优的事情?如果是这样,你能告诉我替换代码行吗?

Looking at this code, is it obvious that I am doing something suboptimal? If so, can you let me know the replacement code lines?

我知道以下警告,并认为我的案例中的警告是误报:

I am aware of the below warning and like to think that the warnings in my case are false positives:

链式赋值警告/异常旨在通知可能无效分配的用户.可能存在误报;无意中报告了链式分配的情况.

The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid assignment. There may be false positives; situations where a chained assignment is inadvertantly reported.

导致第一个复制警告错误的代码.

EDIT : the code leading to the first copy warning error.

data['amount'] = data.apply(lambda row: function1(row,date,qty), axis=1) 
data['amount'] = data['amount'].astype(float)

def function1(row,date,qty):
    try:
        if(row['currency'] == 'A'):
            result = row[qty]
        else:
            rate = lookup[lookup['Date']==row[date]][row['currency'] ]
            result = float(rate) * float(row[qty])
        return result
    except ValueError: # generic exception clause
        print "The current row causes an exception:"

推荐答案

SettingWithCopy 的目的是警告用户您可能正在做一些不会的事情像预期的那样更新原始数据框.

The point of the SettingWithCopy is to warn the user that you may be doing something that will not update the original data frame as one might expect.

这里,data 是一个数据帧,可能是单个数据类型(或不是).然后,您将引用这个作为系列的 data['amount'] 并更新它.这可能适用于您的情况,因为您返回的数据类型与现有数据相同.

Here, data is a dataframe, possibly of a single dtype (or not). You are then taking a reference to this data['amount'] which is a Series, and updating it. This probably works in your case because you are returning the same dtype of data as existed.

但是它可以创建一个副本来更新您看不到的data['amount']副本;那么你会想知道为什么它没有更新.

However it could create a copy which updates a copy of data['amount'] which you would not see; Then you would be wondering why it is not updating.

Pandas 在几乎所有的方法调用中都会返回一个对象的副本.inplace 操作是一种有效的操作,但通常不清楚数据正在被修改并且可能在副本上工作.

Pandas returns a copy of an object in almost all method calls. The inplace operations are a convience operation which work, but in general are not clear that data is being modified and could potentially work on copies.

更清楚地做到这一点:

data['amount'] = data["amount"].fillna(data.groupby("num")["amount"].transform("mean"))

data["amount"] = data['amount'].fillna(mean_avg)

制作副本的另一个优点.您可以链接操作,这对于 inplace 操作是不可能的.

One further plus to working on copies. You can chain operations, this is not possible with inplace ones.

例如

data['amount'] = data['amount'].fillna(mean_avg)*2

仅供参考.inplace 操作既不快也不高效.my2c 他们应该被禁止.但该 API 为时已晚.

And just an FYI. inplace operations are neither faster nor more memory efficient. my2c they should be banned. But too late on that API.

您当然可以关闭此功能:

You can of course turn this off:

pd.set_option('chained_assignment',None)

Pandas 与整个测试套件一起运行,此设置为 raise(因此我们知道是否正在发生链接),仅供参考.

Pandas runs with the entire test suite with this set to raise (so we know if chaining is happening) on, FYI.

这篇关于Pandas:链式赋值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆