具有不同偏移量矢量的 pandas 矢量化日期偏移量操作 [英] Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

查看：140 发布时间：2020/5/24 0:43:53 python pandas

本文介绍了具有不同偏移量矢量的 pandas 矢量化日期偏移量操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试执行以下操作，但似乎不支持此模式下的矢量化操作.

I am trying to do the following but is seems that vectorized operations in this mode are not supported.

import pandas as pd
df=pd.DataFrame([[2017,1,15,1],
             [2017,1,15,2],
             [2017,1,15,3],
             [2017,1,15,4],
             [2017,1,15,5],
             [2017,1,15,6],
             [2017,1,15,7]],
             columns=['year','month','day','month_offset'])
df['date']=df.apply(lambda g: pd.datetime(g.year,g.month,g.day),axis=1)
df['offset']=df.apply(lambda g: pd.offsets.MonthEnd(g.month_offset),axis=1)
df['date_offset']=df.date+df.offset

这是代码段中最后一条语句返回的警告:

This is the warning returned for last statement in the code snippet:

C:\ Python3.5.2.3 \ WinPython-64bit-3.5.2.3 \ python-3.5.2.amd64 \ lib \ site-packages \ pandas \ core \ ops.py:533:PerformanceWarning:加/减数组DateOffsets到Series的值未向量化系列未矢量化"，PerformanceWarning)

C:\Python3.5.2.3\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py:533: PerformanceWarning: Adding/subtracting array of DateOffsets to Series not vectorized "Series not vectorized", PerformanceWarning)

出于性能方面的考虑，我希望将此操作作为矢量化操作.

I would like to this to work as a vectorized operation because of the performance benefits.

谢谢.

最后，对@ john-zwinck后面的方法进行比较:

To end, comparison of methods following on from @john-zwinck:

import time
import pandas as pd
import numpy as np

df=pd.DataFrame([[2017,1,1,1],
             [2017,1,1,2],
             [2017,1,1,3],
             [2017,1,1,4],
             [2017,1,1,5],
             [2017,1,1,6],
             [2017,1,1,7]],
             columns=['year','month','day','month_offset'])

df['mydate']=df.apply(lambda g: 
pd.datetime(g.year,g.month,g.day),axis=1)
start_time=time.time()
df['pandas_offset']=df.apply(lambda g: g.mydate + 
pd.offsets.MonthEnd(g.month_offset),axis=1)
end_time=time.time()
print('Method1 {} seconds'.format(end_time-start_time))

start_time=time.time()
df['numpy_offset']=(df.mydate.values.astype('M8[M]')+ 
df.month_offset.values * np.timedelta64(1, 'M')).astype('M8[D]') - 
np.timedelta64(1, 'D')
end_time=time.time()
print('Method3 with numpy vectorization {} seconds'.format(end_time-
start_time))

结果:

index year  month  day  month_offset     mydate    offset1      final
0  2017      1    1             1 2017-01-01 2017-01-31 2017-01-31
1  2017      1    1             2 2017-01-01 2017-02-28 2017-02-28
2  2017      1    1             3 2017-01-01 2017-03-31 2017-03-31
3  2017      1    1             4 2017-01-01 2017-04-30 2017-04-30
4  2017      1    1             5 2017-01-01 2017-05-31 2017-05-31
5  2017      1    1             6 2017-01-01 2017-06-30 2017-06-30
6  2017      1    1             7 2017-01-01 2017-07-31 2017-07-31


runfile('C:/bitbucket/test/vector_dates.py', wdir='C:/bitbucket/test')
Method 1 0.003999948501586914 seconds
Method 2 with numpy vectorization 0.0009999275207519531 seconds

明显的numpy快得多

Clearly numpy much faster

具有不同偏移量矢量的 pandas 矢量化日期偏移量操作 [英] Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

具有不同偏移量矢量的 pandas 矢量化日期偏移量操作 [英] Pandas Vectorized Date Offset Operations with Vector of Differing Offsets

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭