如何转换存储为两列(开始,结束)的日期范围以创建新的行索引并创建值的累积汇率? [英] How to transform date range stored as two columns (start, end) to create new row index and create accumulated rate for values?
本文介绍了如何转换存储为两列(开始,结束)的日期范围以创建新的行索引并创建值的累积汇率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想知道如何转换存储为两列(开始,结束)的日期范围以创建新的行索引?例如,我想转换以下数据:
I was wondering how to transform a date range stored as two columns (start, end) to create new row index? For example I would like to convert the data below:
end start value
0 2000-01-04 2000-01-02 6
1 2000-01-05 2000-01-03 9
收件人:
date rate
2000-01-02 2
2000-01-03 5
2000-01-04 5
2000-01-05 3
注意:
该开始和结束显示了一个范围,费率是该时间范围内分配的值,我正在寻找每天所有费率的总和
That start and end shows a range and rate is the value distributed over the time frame, and I am looking for sum of all rates for each day
推荐答案
import pandas as pd
import numpy as np
import io
temp=u"""end,start,value
2000-01-04,2000-01-02,6
2000-01-05,2000-01-03,9"""
df = pd.read_csv(io.StringIO(temp), parse_dates = [0,1])
print df
#change ordering for filling date from start to end
df = df[['start', 'end', 'value']]
#value divided difference of start and end, but it cant count first day, so has to be added
df['value'] = df['value']/(df['end'] + pd.Timedelta('1 days')- df['start']).astype('timedelta64[D]')
df['Id'] = df.index
#reshape datetimes from rows to columns
df = pd.melt(df, id_vars=[ 'value','Id'], var_name=['D'], value_name='Date')
#remove unnecessary column D
del df['D']
print df
# value Id Date
#0 2 0 2000-01-02
#1 3 1 2000-01-03
#2 2 0 2000-01-04
#3 3 1 2000-01-05
#set multiindex
df = df.set_index(['Id', 'Date' ])
#fill gap between start and end dates
f = lambda df: df.asfreq("D", method='ffill')
df = df.reset_index(level=0).groupby('Id').apply(f)
del df['Id']
df = df.reset_index()
print df
# Id Date value
#0 0 2000-01-02 2
#1 0 2000-01-03 2
#2 0 2000-01-04 2
#3 1 2000-01-03 3
#4 1 2000-01-04 3
#5 1 2000-01-05 3
#sum column value to column rate
df['rate'] = df.groupby('Date')['value'].transform('sum')
#delete unnecessary columns
df = df.drop(['Id', 'value'], axis=1 )
#drop duplicity
df = df.drop_duplicates()
print df
#
# Date rate
#0 2000-01-02 2
#1 2000-01-03 5
#2 2000-01-04 5
#5 2000-01-05 3
这篇关于如何转换存储为两列(开始,结束)的日期范围以创建新的行索引并创建值的累积汇率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文