reindex在大 pandas 数据框中添加缺少的日期 [英] reindex to add missing dates to pandas dataframe
问题描述
我尝试解析一个如下所示的CSV文件:
I try to parse a CSV file which looks like this:
dd.mm.yyyy value
01.01.2000 1
02.01.2000 2
01.02.2000 3
我需要添加缺少的日期,并用 NaN 填充相应的值。我使用 Series.reindex
,如这个问题:
I need to add missing dates and fill according values with NaN. I used Series.reindex
like in this question:
import pandas as pd
ts=pd.read_csv(file, sep=';', parse_dates='True', index_col=0)
idx = pd.date_range('01.01.2000', '02.01.2000')
ts.index = pd.DatetimeIndex(ts.index)
ts = ts.reindex(idx, fill_value='NaN')
但结果是,由于日期格式(即mm / dd而不是dd / mm),某些日期的值将被交换:
But in result, values for certain dates are swapped due to date format (i.e. mm/dd instead of dd/mm):
01.01.2000 1
02.01.2000 3
03.01.2000 NaN
...
...
31.01.2000 NaN
01.02.2000 2
我尝试了几种方式(即添加 dayfirst = True
to read_csv
)做对,但仍然无法弄清楚。请帮助。
I tried several ways (i.e. add dayfirst=True
to read_csv
) to do it right but still can't figure it out. Please, help.
推荐答案
将 parse_dates
设置为第一列, code> parse_dates = [0] :
Set parse_dates
to the first column with parse_dates=[0]
:
ts = pd.read_csv(file, sep=';', parse_dates=[0], index_col=0, dayfirst=True)
idx = pd.date_range('01.01.2000', '02.01.2000')
ts.index = pd.DatetimeIndex(ts.index)
ts = ts.reindex(idx, fill_value='NaN')
print(ts)
打印:
value
2000-01-01 1
2000-01-02 2
2000-01-03 NaN
...
2000-01-31 NaN
2000-02-01 3
parse_dates = [0]
告诉熊猫明确地将第一列解析为日期。从文档中:
parse_dates=[0]
tells pandas to explicitly parse the first column as dates. From the docs:
parse_dates:boolean,int或名称列表,列表列表或dict
parse_dates : boolean, list of ints or names, list of lists, or dict
如果True - >尝试解析索引。
If True -> try parsing the index.
如果[1,2,3] - >尝试解析列1,2,3作为单独的日期列。
If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
如果[[1,3]] - >组合列1和3并将其解析为单个日期列。
If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
{'foo':[1,3]} - >解析列1,3作为日期并调用结果'foo'
{'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo'
对于iso8601格式的日期,存在快速路径。
A fast-path exists for iso8601-formatted dates.
这篇关于reindex在大 pandas 数据框中添加缺少的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!