如何将Lambda函数正确应用于 pandas 数据框列 [英] How to properly apply a lambda function into a pandas data frame column

查看:44
本文介绍了如何将Lambda函数正确应用于 pandas 数据框列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框sample,其中一列称为PR,向其中应用了lambda函数,如下所示:

I have a pandas data frame, sample, with one of the columns called PR to which am applying a lambda function as follows:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)

然后我收到以下语法错误消息:

I then get the following syntax error message:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
                                                         ^
SyntaxError: invalid syntax

我在做什么错了?

推荐答案

您需要 mask :

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)

使用 loc 的另一种解决方案和 boolean indexing :

sample.loc[sample['PR'] < 90, 'PR'] = np.nan

示例:

import pandas as pd
import numpy as np

sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
    PR
0   10
1  100
2   40

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
      PR
0    NaN
1  100.0
2    NaN

sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
      PR
0    NaN
1  100.0
2    NaN

使用apply的解决方案:

sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)

时间 len(df)=300k:

sample = pd.concat([sample]*100000).reset_index(drop=True)

In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop

In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop

这篇关于如何将Lambda函数正确应用于 pandas 数据框列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆