如何转换存储为两列(开始,结束)的日期范围以创建新的行索引并创建值的累积汇率? [英] How to transform date range stored as two columns (start, end) to create new row index and create accumulated rate for values?

查看:80
本文介绍了如何转换存储为两列(开始,结束)的日期范围以创建新的行索引并创建值的累积汇率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何转换存储为两列(开始,结束)的日期范围以创建新的行索引?例如,我想转换以下数据:

I was wondering how to transform a date range stored as two columns (start, end) to create new row index? For example I would like to convert the data below:

    end         start     value
0   2000-01-04  2000-01-02  6
1   2000-01-05  2000-01-03  9

收件人:

date      rate
2000-01-02  2
2000-01-03  5
2000-01-04  5
2000-01-05  3

注意:

该开始和结束显示了一个范围,费率是该时间范围内分配的值,我正在寻找每天所有费率的总和

That start and end shows a range and rate is the value distributed over the time frame, and I am looking for sum of all rates for each day

推荐答案

import pandas as pd
import numpy as np
import io

temp=u"""end,start,value
2000-01-04,2000-01-02,6
2000-01-05,2000-01-03,9"""

df = pd.read_csv(io.StringIO(temp), parse_dates = [0,1])
print df
#change ordering for filling date from start to end
df = df[['start', 'end', 'value']]

#value divided difference of start and end, but it cant count first day, so has to be added
df['value'] = df['value']/(df['end'] + pd.Timedelta('1 days')- df['start']).astype('timedelta64[D]')

df['Id'] = df.index
#reshape datetimes from rows to columns
df = pd.melt(df, id_vars=[ 'value','Id'], var_name=['D'], value_name='Date')
#remove unnecessary column D
del df['D']
print df
#   value  Id       Date
#0      2   0 2000-01-02
#1      3   1 2000-01-03
#2      2   0 2000-01-04
#3      3   1 2000-01-05

#set multiindex
df = df.set_index(['Id', 'Date' ])

#fill gap between start and end dates
f = lambda df: df.asfreq("D", method='ffill')
df = df.reset_index(level=0).groupby('Id').apply(f)

del df['Id']
df = df.reset_index()
print df
#   Id       Date  value
#0   0 2000-01-02      2
#1   0 2000-01-03      2
#2   0 2000-01-04      2
#3   1 2000-01-03      3
#4   1 2000-01-04      3
#5   1 2000-01-05      3

#sum column value to column rate
df['rate'] = df.groupby('Date')['value'].transform('sum')
#delete unnecessary columns
df = df.drop(['Id', 'value'], axis=1 )
#drop duplicity
df = df.drop_duplicates()
print df
#
#        Date  rate
#0 2000-01-02     2
#1 2000-01-03     5
#2 2000-01-04     5
#5 2000-01-05     3

这篇关于如何转换存储为两列(开始,结束)的日期范围以创建新的行索引并创建值的累积汇率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆