在新的 pandas 数据框列中计算以年,月等为单位的日期时间差异 [英] calculate datetime-difference in years, months, etc. in a new pandas dataframe column
问题描述
我有一个如下所示的熊猫数据框:
I have a pandas dataframe looking like this:
Name start end
A 2000-01-10 1970-04-29
我想添加一个新列,以提供start
和end
列之间的年,月,日之间的差异.
I want to add a new column providing the difference between the start
and end
column in years, months, days.
所以结果应该像这样:
Name start end diff
A 2000-01-10 1970-04-29 29y9m etc.
diff列也可以是datetime
对象或timedelta
对象,但是对我来说,关键点是,我可以轻松获得 Year 和 Month .
the diff column may also be a datetime
object or a timedelta
object, but the key point for me is, that I can easily get the Year and Month out of it.
到目前为止,我尝试过的是:
What I tried until now is:
df['diff'] = df['end'] - df['start']
这将导致新列包含10848 days
.但是,我不知道如何将日期转换为 29y9m等.
This results in the new column containing 10848 days
. However, I do not know how to convert the days to 29y9m etc.
推荐答案
使用简单的功能,您就可以实现自己的目标.
With a simple function you can reach your goal.
该功能可以通过简单的计算来计算年差和月差.
The function calculates the years difference and the months difference with a simple calculation.
import pandas as pd
import datetime
def parse_date(td):
resYear = float(td.days)/364.0 # get the number of years including the the numbers after the dot
resMonth = int((resYear - int(resYear))*364/30) # get the number of months, by multiply the number after the dot by 364 and divide by 30.
resYear = int(resYear)
return str(resYear) + "Y" + str(resMonth) + "m"
df = pd.DataFrame([("2000-01-10", "1970-04-29")], columns=["start", "end"])
df["delta"] = [parse_date(datetime.datetime.strptime(start, '%Y-%m-%d') - datetime.datetime.strptime(end, '%Y-%m-%d')) for start, end in zip(df["start"], df["end"])]
print df
start end delta
0 2000-01-10 1970-04-29 29Y9m
这篇关于在新的 pandas 数据框列中计算以年,月等为单位的日期时间差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!