需要在python中的na值之前用过去的三个值填充NA值 [英] need to fill the NA values with the past three values before na values in python
问题描述
需要用该NA的过去三个值来填充NA值
need to fill the NA values with the past three values mean of that NA
这是我的数据集
RECEIPT_MONTH_YEAR NET_SALES
RECEIPT_MONTH_YEAR NET_SALES
0 2014-01-01 818817.20
0 2014-01-01 818817.20
1 2014-02-01 362377.20
1 2014-02-01 362377.20
2 2014-03-01 374644.60
2 2014-03-01 374644.60
3 2014-04-01不适用
3 2014-04-01 NA
4 2014-05-01不适用
4 2014-05-01 NA
5 2014-06-01不适用
5 2014-06-01 NA
6 2014年7月1日不适用
6 2014-07-01 NA
7 2014-08-01 46382.50
7 2014-08-01 46382.50
8 2014-09-01 55933.70
8 2014-09-01 55933.70
9 2014-10-01 292303.40
9 2014-10-01 292303.40
10 2014-10-01 382928.60
10 2014-10-01 382928.60
推荐答案
此数据集是.csv文件还是数据框.这个NA是'NaN'还是字符串?
is this dataset a .csv file or a dataframe. This NA is a 'NaN' or a string ?
import pandas as pd
import numpy as np
df=pd.read_csv('your dataset',sep=' ')
df.replace('NA',np.nan)
df.fillna(method='ffill',inplace=True)
您提到了3个值的均值..上面只是向前填充了NaN开始之前的最后一个观察值.这通常是一种预测的好方法(如果持久性很重要,则在某些情况下比采取手段要好)
you mention something about mean of 3 values..the above simply forward fills the last observation before the NaNs begin. This is often a good way for forecasting (better than taking means in certain cases, if persistence is important)
ind = df['NET_SALES'].index[df['NET_SALES'].apply(np.isnan)]
Meanof3 = df.iloc[ind[0]-3:ind[0]].mean(axis=1,skipna=True)
df.replace('NA',Meanof3)
如果知道有关数据集的更多信息,也许可以对答案进行概括和改进-例如,如果您始终想取任何NA之前的最后3个测量值的平均值.上面的代码可让您检查NaN的索引,然后在不考虑任何NaN的情况下取3的平均值
Maybe the answer can be generalised and improved if more info about the dataset is known - like if you always want to take the mean of last 3 measurements before any NA. The above will allow you to check the indices that are NaNs and then take mean of 3 before, while ignoring any NaNs
这篇关于需要在python中的na值之前用过去的三个值填充NA值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!