用中位数代替NaN值? [英] Replace NaN value with a median?

查看:40
本文介绍了用中位数代替NaN值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我尝试使用Pandas将表格中的所有NaN值替换为特定范围内的中位数.我正在使用更大的数据集,但是例如

So I am trying to use Pandas to replace all NaN values in a table with the median across a particular range. I am working with a larger dataset but for example

np.random.seed(0)
rng = pd.date_range('2020-09-24', periods=20, freq='0.2H')
df = pd.DataFrame({ 'Date': rng, 'Val': np.random.randn(len(rng)), 'Dist' :np.random.randn(len(rng)) }) 
df.Dist[df.Dist<=-0.6] = np.nan
df.Val[df.Val<=-0.5] = np.nan

我想做的是用该列每小时的中位数替换Val和Dist的NaN值.我设法在单独的参考表中获得了中位数:

What I want to do is replace the NaN values for Val and Dist with the median value for each hour for that column. I have managed to get the median values in a separate reference table:

df.set_index('Date', inplace=True)
df = df.assign(Hour = lambda x : x.index.hour)
df_val = df[["Val", "Hour"]].groupby("Hour").median()
df_dist = df[["Dist", "Hour"]].groupby("Hour").median()

但是现在我已经尝试过以下各种形式的所有以下命令,并且无法弄清楚如何填充NaN值.

But now I have tried all of the below commands in various forms and cannot work out how to fill NaN values.

df[["Val","Hour"]].mask(df['Val'].isna(), df_val.iloc[df.Hour], inplace=True)

df.where(df['Val'].notna(), other=df_val[df.Hour],axis = 0)

df["Val"] = np.where(df['Val'].notna(), df['Val'], df_val(df.Hour))

df.replace({"Val":{np.nan:df_val[df.Hour]}, "Dist":{np.nan:df_dist[df.Hour]}})

推荐答案

您可以使用 groupby.transform 和fillna:

You can use groupby.transform and fillna:

cols = ['Val','Dist']
df[cols] =  df[cols].fillna(df.groupby(df.Date.dt.floor('H'))
                              [cols].transform('median')
                           )

输出:

                  Date       Val      Dist
0  2020-09-24 00:00:00  1.764052  0.864436
1  2020-09-24 00:12:00  0.400157  0.653619
2  2020-09-24 00:24:00  0.978738  0.864436
3  2020-09-24 00:36:00  2.240893  0.864436
4  2020-09-24 00:48:00  1.867558  2.269755
5  2020-09-24 01:00:00  0.153690  0.757559
6  2020-09-24 01:12:00  0.950088  0.045759
7  2020-09-24 01:24:00 -0.151357 -0.187184
8  2020-09-24 01:36:00 -0.103219  1.532779
9  2020-09-24 01:48:00  0.410599  1.469359
10 2020-09-24 02:00:00  0.144044  0.154947
11 2020-09-24 02:12:00  1.454274  0.378163
12 2020-09-24 02:24:00  0.761038  0.154947
13 2020-09-24 02:36:00  0.121675  0.154947
14 2020-09-24 02:48:00  0.443863 -0.347912
15 2020-09-24 03:00:00  0.333674  0.156349
16 2020-09-24 03:12:00  1.494079  1.230291
17 2020-09-24 03:24:00 -0.205158  1.202380
18 2020-09-24 03:36:00  0.313068 -0.387327
19 2020-09-24 03:48:00  0.323371 -0.302303

这篇关于用中位数代替NaN值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆