在DataFrame中查找两列之间的时差 [英] Finding time difference between two columns in DataFrame

查看:128
本文介绍了在DataFrame中查找两列之间的时差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试查找以下帧的两列之间的时差:

I am trying to find the time difference between two columns of the following frame:

测试日期|测试类型首次使用日期

Test Date | Test Type | First Use Date

我使用以下函数定义来获得区别:

I used the following function definition to get the difference:

def days_between(d1, d2):
    d1 = datetime.strptime(d1, "%Y-%m-%d")
    d2 = datetime.strptime(d2, "%Y-%m-%d")
    return abs((d2 - d1).days)

它工作正常,但是它不需要输入序列.所以我必须构造一个循环遍历索引的for循环:

And it works fine, however it does not take a series as an input. So I had to construct a for loop that loops over indices:

age_veh = []
for i in range(0, len(data_manufacturer)-1):
    age_veh[i].append(days_between(data_manufacturer.iloc[i,0], data_manufacturer.iloc[i,4]))

但是,它确实返回错误: IndexError:列表索引超出范围

However, it does return an error: IndexError: list index out of range

我不知道这是否是正确的方法,我在做什么错,否则将不胜感激.还请记住,我大约有200万行.

I don't know whether it's the right way of doing and what am I doing wrong or an alternative solution will be much appreciated. Please also bear in mind that I have around 2 mil rows.

推荐答案

使用 to_datetime ,然后您可以减去列以在

Convert the columns using to_datetime then you can subtract the columns to produce a timedelta on the abs values, then you can call dt.days to get the total number of days, example:

In [119]:
import io
import pandas as pd
t="""Test Date,Test Type,First Use Date
2011-02-05,A,2010-01-05
2012-02-05,A,2010-03-05
2013-02-05,A,2010-06-05
2014-02-05,A,2010-08-05"""
df = pd.read_csv(io.StringIO(t))
df
Out[119]:
    Test Date Test Type First Use Date
0  2011-02-05         A     2010-01-05
1  2012-02-05         A     2010-03-05
2  2013-02-05         A     2010-06-05
3  2014-02-05         A     2010-08-05

In [121]:    
df['Test Date'] = pd.to_datetime(df['Test Date'])
df['First Use Date'] = pd.to_datetime(df['First Use Date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 3 columns):
Test Date         4 non-null datetime64[ns]
Test Type         4 non-null object
First Use Date    4 non-null datetime64[ns]
dtypes: datetime64[ns](2), object(1)
memory usage: 128.0+ bytes

In [122]:
df['days'] = (df['Test Date'] - df['First Use Date']).abs().dt.days
df

Out[122]:
   Test Date Test Type First Use Date  days
0 2011-02-05         A     2010-01-05   396
1 2012-02-05         A     2010-03-05   702
2 2013-02-05         A     2010-06-05   976
3 2014-02-05         A     2010-08-05  1280

这篇关于在DataFrame中查找两列之间的时差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆