根据Python中发生NaN的时间,通过“填充”和“内插”来填充NaN [英] Filling NaN by 'ffill' and 'interpolate' depending on time of the day of NaN occurrence in Python
问题描述
我想使用均值和内插在df中填充NaN,具体取决于NaN发生的时间。如下所示,第一个NaN发生在上午6点,第二个NaN发生在上午8点。
I want to fill NaN in a df using 'mean' and 'interpolate' depending on at what time of the day the NaN occur. As you can see below, the first NaN occur at 6 am and the second NaN is at 8 am.
02/03/2016 05:00 8
02/03/2016 06:00 NaN
02/03/2016 07:00 1
02/03/2016 08:00 NaN
02/03/2016 09:00 3
我的df由数千天组成。对于上午7点之前发生的所有NaN,我想应用填充,对于上午7点之后发生的所有NaN应用内插。我的数据是从早上6点到下午6点。
My df consists of thousand of days. I want to apply 'ffill' for any NaN occur before 7 am and apply 'interpolate' for those occur after 7 am. My data is from 6 am to 6 pm.
我的尝试是:
df_imputed = (df.between_time("00:00:00", "07:00:00", include_start=True, include_end=False)).ffill()
df_imputed = (df.between_time("07:00:00", "18:00:00", include_start=True, include_end=True)).interpolate()
编辑:我的df包含大约400列,因此该过程将应用于所有列。
my df contains around 400 columns so the procedure will apply to all columns.
推荐答案
原始问题:单个值系列
您可以定义布尔序列,然后根据您的条件 内插
或 填充
通过 numpy.where
:
# setup
df = pd.DataFrame({'date': ['02/03/2016 05:00', '02/03/2016 06:00', '02/03/2016 07:00',
'02/03/2016 08:00', '02/03/2016 09:00'],
'value': [8, np.nan, 1, np.nan, 3]})
df['date'] = pd.to_datetime(df['date'])
# construct Boolean switch series
switch = (df['date'] - df['date'].dt.normalize()) > pd.to_timedelta('07:00:00')
# use numpy.where to differentiate between two scenarios
df['value'] = np.where(switch, df['value'].interpolate(), df['value'].ffill())
print(df)
date value
0 2016-02-03 05:00:00 8.0
1 2016-02-03 06:00:00 8.0
2 2016-02-03 07:00:00 1.0
3 2016-02-03 08:00:00 2.0
4 2016-02-03 09:00:00 3.0
更新的问题:多个值系列
具有多个值列,您可以使用 pd.DataFrame.where
和 iloc
。或者,您可以使用 loc
或其他方式(例如 filter
)选择列:
Updated question: multiple series of values
With multiple value columns, you can adjust the above solution using pd.DataFrame.where
and iloc
. Or, instead of iloc
, you can use loc
or other means (e.g. filter
) of selecting columns:
# setup
df = pd.DataFrame({'date': ['02/03/2016 05:00', '02/03/2016 06:00', '02/03/2016 07:00',
'02/03/2016 08:00', '02/03/2016 09:00'],
'value': [8, np.nan, 1, np.nan, 3],
'value2': [3, np.nan, 2, np.nan, 6]})
df['date'] = pd.to_datetime(df['date'])
# construct Boolean switch series
switch = (df['date'] - df['date'].dt.normalize()) > pd.to_timedelta('07:00:00')
# use numpy.where to differentiate between two scenarios
df.iloc[:, 1:] = df.iloc[:, 1:].interpolate().where(switch, df.iloc[:, 1:].ffill())
print(df)
date value value2
0 2016-02-03 05:00:00 8.0 3.0
1 2016-02-03 06:00:00 8.0 3.0
2 2016-02-03 07:00:00 1.0 2.0
3 2016-02-03 08:00:00 2.0 4.0
4 2016-02-03 09:00:00 3.0 6.0
这篇关于根据Python中发生NaN的时间,通过“填充”和“内插”来填充NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!