如何使用 pandas 用不同的随机值替换列中的每个NaN? [英] How to replace every NaN in a column with different random values using pandas?

查看:96
本文介绍了如何使用 pandas 用不同的随机值替换列中的每个NaN?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近一直在和熊猫一起玩,现在我试图用不同的正态分布随机值替换数据框中的NaN值.

I have been playing with pandas lately and I now I tried to replace NaN value inside a dataframe with different random value of normal distribution.

假设我有这个没有标题的CSV文件

Assuming I have this CSV file without header

      0
0    343
1    483
2    101
3    NaN
4    NaN
5    NaN

我的预期结果应该是这样

My expected result should be something like this

       0
0     343
1     483
2     101
3     randomnumber1
4     randomnumber2
5     randomnumber3

但是我得到了以下内容:

But instead I got the following :

       0
0     343
1     483
2     101
3     randomnumber1
4     randomnumber1
5     randomnumber1    # all NaN filled with same number

到目前为止我的代码

import numpy as np
import pandas as pd

df = pd.read_csv("testfile.csv", header=None)
mu, sigma = df.mean(), df.std()
norm_dist = np.random.normal(mu, sigma, 1)
for i in norm_dist:
    print df.fillna(i)

我正在考虑从数据框中获取NaN行的数量,并将np.random.normal(mu, sigma, 1)中的数字1替换为NaN行的总数,以便每个NaN可能具有不同的值.

I am thinking to get the number of NaN row from the dataframe, and replace the number 1 in np.random.normal(mu, sigma, 1) with the total of NaN row so each NaN might have different value.

但是我想问问是否还有其他简单的方法可以做到这一点?

But I want to ask if there is other simple method to do this?

感谢您的帮助和建议.

推荐答案

这是处理基础数组数据的一种方法-

Here's one way working with underlying array data -

def fillNaN_with_unifrand(df):
    a = df.values
    m = np.isnan(a) # mask of NaNs
    mu, sigma = df.mean(), df.std()
    a[m] = np.random.normal(mu, sigma, size=m.sum())
    return df

本质上,我们使用

In essence, we are generating all random numbers in one go with the count of NaNs using the size param with np.random.normal and assigning them in one go with the mask of the NaNs again.

样品运行-

In [435]: df
Out[435]: 
       0
0  343.0
1  483.0
2  101.0
3    NaN
4    NaN
5    NaN

In [436]: fillNaN_with_unifrand(df)
Out[436]: 
            0
0  343.000000
1  483.000000
2  101.000000
3  138.586483
4  223.454469
5  204.464514

这篇关于如何使用 pandas 用不同的随机值替换列中的每个NaN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆