在 pandas 数据框中随机插入NA的值 [英] Randomly insert NA's values in a pandas dataframe

查看:72
本文介绍了在 pandas 数据框中随机插入NA的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在DataFrame中随机插入np.nan? 假设我要在DataFrame中使用10%的null值.

我的数据如下:

df = pd.DataFrame(np.random.randn(5, 3), 
                  index=['a', 'b', 'c', 'd', 'e'],
                  columns=['one', 'two', 'three'])

        one       two     three
a  0.695132  1.044791 -1.059536
b -1.075105  0.825776  1.899795
c -0.678980  0.051959 -0.691405
d -0.182928  1.455268 -1.032353
e  0.205094  0.714192 -0.938242

有没有简单的方法可以插入空值?

解决方案

这是一种清除10%的单元格的方法(或者说是清除现有数据帧大小所能达到的10%的单元格).

 import random
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
    df.iat[row, col] = np.nan
 

这是一种以10%的单细胞概率独立清除细胞的方法.

 df = df.mask(np.random.random(df.shape) < .1)
 

How can I randomly insert np.nan's in a DataFrame ? Let's say I want 10% null values inside my DataFrame.

My data looks like this :

df = pd.DataFrame(np.random.randn(5, 3), 
                  index=['a', 'b', 'c', 'd', 'e'],
                  columns=['one', 'two', 'three'])

        one       two     three
a  0.695132  1.044791 -1.059536
b -1.075105  0.825776  1.899795
c -0.678980  0.051959 -0.691405
d -0.182928  1.455268 -1.032353
e  0.205094  0.714192 -0.938242

Is there an easy way to insert the null values?

解决方案

Here's a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame's size).

import random
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
    df.iat[row, col] = np.nan

Here's a way to clear cells independently with a per-cell probability of 10%.

df = df.mask(np.random.random(df.shape) < .1)

这篇关于在 pandas 数据框中随机插入NA的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆