根据条件掩码 pandas 数据帧中的值 [英] Mask values in a pandas dataframe based on condition

查看:0
本文介绍了根据条件掩码 pandas 数据帧中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要替换数据帧中低于NAS的特定值的值。

例如,假设我需要将所有大于100的值替换为NaN

df = pd.DataFrame({'a':[1,250,480],
               'b':[60,51,101],
               'c':[15,689,1]})

将变为:

({'a':[1,NaN,NaN],
  'b':[60,51,NaN],
  'c':[15,NaN,1]})

执行此操作的最佳方式是什么?

推荐答案

使用:

df = df.mask(df > 100)

df = df.where(df <= 100)

df = pd.DataFrame(np.where(df > 100, np.nan, df), index=df.index, columns=df.columns)

print (df)
     a     b     c
0  1.0  60.0  15.0
1  NaN  51.0   NaN
2  NaN   NaN   1.0

快速比较(取决于数据):

df = pd.concat([df] * 10000, ignore_index=True)

In [104]: %timeit pd.DataFrame(np.where(df > 100, np.nan, df), index=df.index, columns=df.columns)
The slowest run took 4.37 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 683 µs per loop

In [105]: %timeit df[:] = np.where(df.values <= 100, df.values, np.nan)
__main__:257: RuntimeWarning: invalid value encountered in less_equal
The slowest run took 17.24 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 957 µs per loop

In [106]: %timeit df.mask(df > 100)
1000 loops, best of 3: 1.56 ms per loop

In [107]: %timeit df.where(df <= 100)
The slowest run took 8.01 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.84 ms per loop

In [108]: %timeit df[df<100]
The slowest run took 5.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.89 ms per loop

这篇关于根据条件掩码 pandas 数据帧中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆