在股市数据中添加缺少的日行以保持 pandas 数据框的连续性 [英] Add missing day rows in stock market data to maintain continuity in pandas dataframe
问题描述
因此,我有大约13年的股市数据,每日低点高开收盘.问题在于市场有时会在两天之间关闭,因此星期一至星期五有时可能不会连续出现.看下面
So I have around 13 years of stock market data of daily low high open close. The problem is the markets are closed sometimes in between and hence Monday to Friday might not appear continuously sometimes. Look below
日期 | Day | 打开 | 高 | 低 | 关闭 | 调整后的收盘价 | ||
---|---|---|---|---|---|---|---|---|
0 | 17-09-2007 | 星期一 | 6898 | 6977.2 | 6843 | 6897.1 | 6897.100098 | |
1 | 18-09-2007 | 星期二 | 6921.15 | 7078.95 | 6883.6 | 7059.65 | 7059.649902 | |
2 | 19-09-2007 | 星期三 | 7111 | 7419.35 | 7111 | 7401.85 | 7401.850098 | |
3 | 20-09-2007 | 星期四 | 7404.95 | 7462.9 | 7343.6 | 7390.15 | 7390.149902 | |
4 | 21-09-2007 | 星期五 | 7378.3 | 7506.35 | 7367.15 | 7464.5 | 7464.5 | |
5 | 25-09-2007 | 星期二 | 7658.5 | 7694.25 | 7490.2 | 7629.15 | 7629.149902 | |
7 | 26-09-2007 | 星期三 | 7647.1 | 7829.85 | 7591.8 | 7755.9 | 7755.899902 | |
8 | 28-09-2007 | 星期五 | 7838.25 | 8082.85 | 7836.05 | 8042.2 | 8042.200195 | |
9 | 01-10-2007 | 星期一 | 8008.55 | 8085.15 | 7913.3 | 7987.5 | 7987.5 | |
10 | 03-10-2007 | 星期三 | 8029.8 | 8235.8 | 7820.25 | 8097.9 | 8097.899902 | |
11 | 04-10-2007 | 星期四 | 8083.3 | 8086.7 | 7828.65 | 8035.9 | 8035.899902 | |
12 | 05-10-2007 | 星期五 | 8038.1 | 8066.55 | 7789.7 | 7845.25 | 7845.25 | |
13 | 08-10-2007 | 星期一 | 7853.15 | 7935.45 | 7516.45 | 7626.4 | 7626.399902 | |
14 | 09-10-2007 | 星期二 | 7580.9 | 7916.45 | 7535.05 | 7895.85 | 7895.850098 | |
15 | 10-10-2007 | 星期三 | 7960.65 | 8081.05 | 7907.35 | 8030.65 | 8030.649902 | |
16 | 11-10-2007 | 星期四 | 8054.3 | 8177.75 | 8005.5 | 8158.8 | 8158.799805 | |
17 | 15-10-2007 | 星期一 | 7962.55 | 8306.35 | 7962.55 | 8286.3 | 8286.299805 | |
18 | 16-10-2007 | 星期二 | 8361.4 | 8491.65 | 8240.3 | 8452.2 | 8452.200195 | |
19 | 17-10-2007 | 星期三 | 8071.55 | 8218.1 | 7641.5 | 8099.9 | 8099.899902 | |
20 | 18-10-2007 | 星期四 | 8055.15 | 8192.45 | 7519.7 | 7608.75 | 7608.75 | |
21 | 19-10-2007 | 星期五 | 7637.4 | 7718.9 | 7279.3 | 7423.8 | 7423.799805 |
我想拥有从星期一到星期五具有相同模式的统一数据.因此,无论何时缺少哪一天,我都想用前一天的值替换它的值.例如,在4号和5号之间,我将星期一添加24-09-2007,然后从4号(星期五)开始复制所有其他值.有时可能会连续丢失两天,在这种情况下,必须再次复制最后一个值.例如,如果星期一和星期二都不见了,我将添加星期一,复制星期五的数据,然后添加星期二并复制星期一的数据.如何使用python熊猫做到这一点?请记住,有13年的数据,因此会有leap年,年份变化等.
I want to have uniform data that has the same pattern from Monday to Friday. Hence wherever a day in between is missing, I want to replace its value with previous day values. For eg., between no 4 and 5, I would add 24-09-2007 Monday and then all other values copied from no 4 (Friday). Sometimes it can happen that consecutively 2 days are missing in that case again the last value has to be copied. for eg if both Monday and Tuesday were missing, I'd add Monday, copy Friday's data, then add Tuesday and copy Monday's data. How do I do it using python pandas? Remember there are 13 years of data so there will be leap years, year change, etc.
推荐答案
过程:
- 设置日期索引
- 使用"D"重新采样数据框
- 今天是星期六和星期日
- 带有填充"的fillna
# Data Index should be timeSeries
df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y')
# upsample
dfn = df.set_index('Date').resample('D').asfreq()
# covert Date to Monday .. Sunday
dfn['Day'] = dfn.index.strftime('%A')
# drop Day in ['Saturday', 'Sunday']
cond = dfn['Day'].isin(['Saturday', 'Sunday'])
dfn = dfn[~cond].fillna(method='ffill').reset_index()
输出:
Date Day Open High Low Close Adjusted Close
0 2007-09-17 Monday 6898.00 6977.20 6843.00 6897.10 6897.100098
1 2007-09-18 Tuesday 6921.15 7078.95 6883.60 7059.65 7059.649902
2 2007-09-19 Wednesday 7111.00 7419.35 7111.00 7401.85 7401.850098
3 2007-09-20 Thursday 7404.95 7462.90 7343.60 7390.15 7390.149902
4 2007-09-21 Friday 7378.30 7506.35 7367.15 7464.50 7464.500000
5 2007-09-24 Monday 7378.30 7506.35 7367.15 7464.50 7464.500000
6 2007-09-25 Tuesday 7658.50 7694.25 7490.20 7629.15 7629.149902
7 2007-09-26 Wednesday 7647.10 7829.85 7591.80 7755.90 7755.899902
8 2007-09-27 Thursday 7647.10 7829.85 7591.80 7755.90 7755.899902
9 2007-09-28 Friday 7838.25 8082.85 7836.05 8042.20 8042.200195
10 2007-10-01 Monday 8008.55 8085.15 7913.30 7987.50 7987.500000
11 2007-10-02 Tuesday 8008.55 8085.15 7913.30 7987.50 7987.500000
12 2007-10-03 Wednesday 8029.80 8235.80 7820.25 8097.90 8097.899902
13 2007-10-04 Thursday 8083.30 8086.70 7828.65 8035.90 8035.899902
14 2007-10-05 Friday 8038.10 8066.55 7789.70 7845.25 7845.250000
15 2007-10-08 Monday 7853.15 7935.45 7516.45 7626.40 7626.399902
16 2007-10-09 Tuesday 7580.90 7916.45 7535.05 7895.85 7895.850098
17 2007-10-10 Wednesday 7960.65 8081.05 7907.35 8030.65 8030.649902
18 2007-10-11 Thursday 8054.30 8177.75 8005.50 8158.80 8158.799805
19 2007-10-12 Friday 8054.30 8177.75 8005.50 8158.80 8158.799805
20 2007-10-15 Monday 7962.55 8306.35 7962.55 8286.30 8286.299805
21 2007-10-16 Tuesday 8361.40 8491.65 8240.30 8452.20 8452.200195
22 2007-10-17 Wednesday 8071.55 8218.10 7641.50 8099.90 8099.899902
23 2007-10-18 Thursday 8055.15 8192.45 7519.70 7608.75 7608.750000
24 2007-10-19 Friday 7637.40 7718.90 7279.30 7423.80 7423.799805
这篇关于在股市数据中添加缺少的日行以保持 pandas 数据框的连续性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!