在Python中将timedelta转换为int非常慢 [英] Conversion of a timedelta to int very slow in python
问题描述
我有一个包含两列的数据框,每一列由一组日期组成. 我想计算日期之间的差异,并返回天数.但是,上述过程非常缓慢.有谁知道如何加快这一过程?此代码正在大文件中使用,并且速度很重要.
dfx = pd.DataFrame([[datetime(2014,1,1), datetime(2014,1,10)],[datetime(2014,1,1), datetime(2015,1,10)],[datetime(2013,1,1), datetime(2014,1,12)]], columns = ['x', 'y'])
dfx['diffx'] = dfx['y']-dfx['x']
dfx['diff'] = dfx['diffx'].apply(lambda x: x.days)
dfx
最终目标:
您可能会发现 marginal 大规模加速下降到NumPy,而忽略了与pd.Series
对象相关的开销.>
另请参见 pd.Timestamp与np .datetime64:它们可以互换使用吗?.
# Python 3.6.0, Pandas 0.19.2, NumPy 1.11.3
def days_lambda(dfx):
return (dfx['y']-dfx['x']).apply(lambda x: x.days)
def days_pd(dfx):
return (dfx['y']-dfx['x']).dt.days
def days_np(dfx):
return (dfx['y'].values-dfx['x'].values) / np.timedelta64(1, 'D')
# check results are identical
assert (days_lambda(dfx).values == days_pd(dfx).values).all()
assert (days_lambda(dfx).values == days_np(dfx)).all()
dfx = pd.concat([dfx]*100000)
%timeit days_lambda(dfx) # 5.02 s per loop
%timeit days_pd(dfx) # 5.6 s per loop
%timeit days_np(dfx) # 4.72 ms per loop
I have a dataframe with two columns, each one formed by a set of dates. I want to compute the difference between dates and return the the number of days. However, the process (described above) is very slow. Does anyone knows how to accelerate the process? This code is being used in a big file and speed is important.
dfx = pd.DataFrame([[datetime(2014,1,1), datetime(2014,1,10)],[datetime(2014,1,1), datetime(2015,1,10)],[datetime(2013,1,1), datetime(2014,1,12)]], columns = ['x', 'y'])
dfx['diffx'] = dfx['y']-dfx['x']
dfx['diff'] = dfx['diffx'].apply(lambda x: x.days)
dfx
Final goal:
You may find a marginal massive speed-up dropping down to NumPy, bypassing the overhead associated with pd.Series
objects.
See also pd.Timestamp versus np.datetime64: are they interchangeable for selected uses?.
# Python 3.6.0, Pandas 0.19.2, NumPy 1.11.3
def days_lambda(dfx):
return (dfx['y']-dfx['x']).apply(lambda x: x.days)
def days_pd(dfx):
return (dfx['y']-dfx['x']).dt.days
def days_np(dfx):
return (dfx['y'].values-dfx['x'].values) / np.timedelta64(1, 'D')
# check results are identical
assert (days_lambda(dfx).values == days_pd(dfx).values).all()
assert (days_lambda(dfx).values == days_np(dfx)).all()
dfx = pd.concat([dfx]*100000)
%timeit days_lambda(dfx) # 5.02 s per loop
%timeit days_pd(dfx) # 5.6 s per loop
%timeit days_np(dfx) # 4.72 ms per loop
这篇关于在Python中将timedelta转换为int非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!