按新的日期范围重新索引数据框 [英] Re-index dataframe by new range of dates
问题描述
我有一个包含许多观察结果的数据框:
I have a data frame containing a number of observations:
date colour orders
2014-10-20 red 7
2014-10-21 red 10
2014-10-20 yellow 3
我想重新索引数据框并标准化日期.
I would like to re-index the data frame and standardise the dates.
date colour orders
2014-10-20 red 7
2014-10-21 red 10
2014-10-22 red NaN
2014-10-20 yellow 3
2014-10-21 yellow NaN
2014-10-22 yellow NaN
我想按colour
和date
对数据框进行排序,然后尝试重新索引它.
I though to order the data frame by colour
and date
, and then try to re-index it.
index = pd.date_range('20/10/2014', '22/10/2014')
test_df = df.sort(['colour', 'date'], ascending=(True, True))
ts = test_df.reindex(index)
ts
但它返回一个新的数据框,该数据框具有正确的索引但所有 NaN
值.
But it returns a new data frame with the right index but all NaN
values.
date colour orders
2014-10-20 NaN NaN
2014-10-21 NaN NaN
2014-10-22 NaN NaN
推荐答案
从您的示例数据框开始:
Starting from your exampe dataframe:
In [51]: df
Out[51]:
date colour orders
0 2014-10-20 red 7
1 2014-10-21 red 10
2 2014-10-20 yellow 3
如果您想在日期"和颜色"上重新索引,一种可能性是将两者都设置为索引(多索引):
If you want to reindex on both 'date' and 'colour', one possibility is to set both as the index (a multi-index):
In [52]: df = df.set_index(['date', 'colour'])
In [53]: df
Out[53]:
orders
date colour
2014-10-20 red 7
2014-10-21 red 10
2014-10-20 yellow 3
您现在可以在构建所需索引后重新索引此数据框:
You can now reindex this dataframe, after you constructed to desired index:
In [54]: index = pd.date_range('20/10/2014', '22/10/2014')
In [55]: multi_index = pd.MultiIndex.from_product([index, ['red', 'yellow']])
In [56]: df.reindex(multi_index)
Out[56]:
orders
2014-10-20 red 7
yellow 3
2014-10-21 red 10
yellow NaN
2014-10-22 red NaN
yellow NaN
要获得与示例输出相同的输出,索引应在第二级排序(level=1
,因为它是从 0 开始的):
To have the same output as your example output, the index should be sorted in the second level (level=1
as it is 0-based):
In [60]: df2 = df.reindex(multi_index)
In [64]: df2.sortlevel(level=1)
Out[64]:
orders
2014-10-20 red 7
2014-10-21 red 10
2014-10-22 red NaN
2014-10-20 yellow 3
2014-10-21 yellow NaN
2014-10-22 yellow NaN
自动生成多索引的一种可能方法是(使用您的原始框架):
A possible way to generate the multi-index automatically would be (with your original frame):
pd.MultiIndex.from_product([pd.date_range(df['date'].min(), df['date'].max(), freq='D'),
df['colour'].unique()])
<小时>
另一种方法是对每组颜色使用resample
:
In [77]: df = df.set_index('date')
In [78]: df.groupby('colour').resample('D')
这更简单,但这不会为您提供每种颜色的完整日期范围,只会提供该颜色组可用的日期范围.
This is simpler, but this does not give you the full range of dates for each colour, only the range of dates that is available for that colour group.
这篇关于按新的日期范围重新索引数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!