在DataFrame中查找两列之间的时差 [英] Finding time difference between two columns in DataFrame
问题描述
我正在尝试查找以下帧的两列之间的时差:
I am trying to find the time difference between two columns of the following frame:
测试日期|测试类型首次使用日期
Test Date | Test Type | First Use Date
我使用以下函数定义来获得区别:
I used the following function definition to get the difference:
def days_between(d1, d2):
d1 = datetime.strptime(d1, "%Y-%m-%d")
d2 = datetime.strptime(d2, "%Y-%m-%d")
return abs((d2 - d1).days)
它工作正常,但是它不需要输入序列.所以我必须构造一个循环遍历索引的for循环:
And it works fine, however it does not take a series as an input. So I had to construct a for loop that loops over indices:
age_veh = []
for i in range(0, len(data_manufacturer)-1):
age_veh[i].append(days_between(data_manufacturer.iloc[i,0], data_manufacturer.iloc[i,4]))
但是,它确实返回错误: IndexError:列表索引超出范围
However, it does return an error: IndexError: list index out of range
我不知道这是否是正确的方法,我在做什么错,否则将不胜感激.还请记住,我大约有200万行.
I don't know whether it's the right way of doing and what am I doing wrong or an alternative solution will be much appreciated. Please also bear in mind that I have around 2 mil rows.
推荐答案
使用 to_datetime
,然后您可以减去列以在 dt.days
以获取总天数,例如:
Convert the columns using to_datetime
then you can subtract the columns to produce a timedelta
on the abs
values, then you can call dt.days
to get the total number of days, example:
In [119]:
import io
import pandas as pd
t="""Test Date,Test Type,First Use Date
2011-02-05,A,2010-01-05
2012-02-05,A,2010-03-05
2013-02-05,A,2010-06-05
2014-02-05,A,2010-08-05"""
df = pd.read_csv(io.StringIO(t))
df
Out[119]:
Test Date Test Type First Use Date
0 2011-02-05 A 2010-01-05
1 2012-02-05 A 2010-03-05
2 2013-02-05 A 2010-06-05
3 2014-02-05 A 2010-08-05
In [121]:
df['Test Date'] = pd.to_datetime(df['Test Date'])
df['First Use Date'] = pd.to_datetime(df['First Use Date'])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 3 columns):
Test Date 4 non-null datetime64[ns]
Test Type 4 non-null object
First Use Date 4 non-null datetime64[ns]
dtypes: datetime64[ns](2), object(1)
memory usage: 128.0+ bytes
In [122]:
df['days'] = (df['Test Date'] - df['First Use Date']).abs().dt.days
df
Out[122]:
Test Date Test Type First Use Date days
0 2011-02-05 A 2010-01-05 396
1 2012-02-05 A 2010-03-05 702
2 2013-02-05 A 2010-06-05 976
3 2014-02-05 A 2010-08-05 1280
这篇关于在DataFrame中查找两列之间的时差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!