pandas 重新采样上采样日期/边缘数据 [英] Pandas Resample Upsample last date / edge of data
本文介绍了 pandas 重新采样上采样日期/边缘数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试将每周数据上采样到每日数据,但是,我在上采样最后一个边缘时遇到困难.我该怎么办?
I'm trying to upsample weekly data to daily data, however, I'm having difficulty upsampling the last edge. How can I go about this?
import pandas as pd
import datetime
df = pd.DataFrame({'wk start': ['2018-08-12', '2018-08-12', '2018-08-19'],
'car': [ 'tesla model 3', 'tesla model x', 'tesla model 3'],
'sales':[38000,98000, 40000]})
df['wk start'] = df['wk start'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d'))
df.set_index('wk start').groupby('car').resample('D').pad()
这将返回:
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
我想要的输出是:
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
2018-08-20 tesla model 3 40000
2018-08-21 tesla model 3 40000
2018-08-22 tesla model 3 40000
2018-08-23 tesla model 3 40000
2018-08-24 tesla model 3 40000
2018-08-25 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
2018-08-13 tesla model x 98000
2018-08-14 tesla model x 98000
2018-08-15 tesla model x 98000
2018-08-16 tesla model x 98000
2018-08-17 tesla model x 98000
2018-08-18 tesla model x 98000
我查看了,但是他们正在使用句点,而我正在查看日期时间.预先感谢!
I looked at this, but they're using periods and I'm looking at datetimes. Thanks in advance!
推荐答案
是的,您是正确的,排除了最后的边缘数据.解决方案是将它们添加到输入 DataFrame
中-我的解决方案使用 concat
到原始的 df
在使用您的解决方案之前:
Yes, you are right, last edge data are excluded. Solution is add them to input DataFrame
- my solution creates a helper Dataframe
using drop_duplicates
, adds 6
days and concat
's to original df
before using your solution:
df1 = df.sort_values('wk start').drop_duplicates('car', keep='last').copy()
df1['wk start'] = df1['wk start'] + pd.Timedelta(6, unit='d')
df = pd.concat([df, df1], ignore_index=True)
df = df.set_index('wk start').groupby('car').resample('D').pad()
print (df)
car sales
car wk start
tesla model 3 2018-08-12 tesla model 3 38000
2018-08-13 tesla model 3 38000
2018-08-14 tesla model 3 38000
2018-08-15 tesla model 3 38000
2018-08-16 tesla model 3 38000
2018-08-17 tesla model 3 38000
2018-08-18 tesla model 3 38000
2018-08-19 tesla model 3 40000
2018-08-20 tesla model 3 40000
2018-08-21 tesla model 3 40000
2018-08-22 tesla model 3 40000
2018-08-23 tesla model 3 40000
2018-08-24 tesla model 3 40000
2018-08-25 tesla model 3 40000
tesla model x 2018-08-12 tesla model x 98000
2018-08-13 tesla model x 98000
2018-08-14 tesla model x 98000
2018-08-15 tesla model x 98000
2018-08-16 tesla model x 98000
2018-08-17 tesla model x 98000
2018-08-18 tesla model x 98000
这篇关于 pandas 重新采样上采样日期/边缘数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文