iterrows 的矢量化替代方案 [英] Vectorized alternative to iterrows

查看:70
本文介绍了iterrows 的矢量化替代方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在提高脚本的速度并看到以下答案:Iterrows 性能问题.在这里,答案说很少需要使用iterrows.

I am enhancing the speed of my script and saw the following answer: Iterrows Performance Issues. Here, the answer says that it is rarely needed to use iterrows.

在我的代码中,我使用 iterrows,因为它使用起来非常简单直观,但也非常非常慢.所以我想对我使用 iterrows 的代码段进行矢量化.这里有两个例子,我无法找到解决方案.在这两个示例中,列中的值都是具有以下格式的所有日期时间值:%Y-%m-%d %H:%M:%S

In my code, I make use of iterrows since it is very straightforward and intuitive to use, however also very very slow. So I would like to vectorize the pieces of code I use iterrows. Here come two examples, where i cannot manage to find a solution. In both examples the values in the columns are all datetime values that have the following format: %Y-%m-%d %H:%M:%S

for index, row in df.iterrows():
    df.loc[index, 'Time_Between']= row['Time_Begin'] + timedelta(seconds=row['Some_Integer_Seconds_In_A_Column'])
    df.loc[index, 'Time_Required']= row['Time_End'] - timedelta(seconds=SomeIntegerSecondsAsAVariable)
    df.loc[index, 'Tota_Time']= ((row['Time_Begin'] - row['Time_First']).total_seconds())/60


for index, row in df.iterrows():
    if row['Time_Required'] > row['Time_Between']:
        df.loc[index, 'Check']= 0
    else:
        df.loc[index, 'Check']= 1

我怎样才能矢量化这个?我尝试屏蔽并申请,但我无法获得任何工作.大多数时候我得到:TypeError:无法更改对象数组的数据类型.一些我用 iterrows 没有得到的东西...

How can I vectorized this? I tried masking and apply but I cannot get anything working. Most of the time I get: TypeError: Cannot change data-type for object array.Something I dont get with the iterrows...

推荐答案

我觉得你可以用:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Time_End': {0: pd.Timestamp('2015-11-15 00:00:00'), 1: pd.Timestamp('2015-10-18 00:00:00'), 2: pd.Timestamp('2015-10-17 00:00:00'), 3: pd.Timestamp('2015-10-16 00:00:00')}, 'Int_Sec': {0: 4, 1: 2, 2: 7, 3: 10}, 'Time_First': {0: pd.Timestamp('2015-10-15 00:00:00'), 1: pd.Timestamp('2015-10-15 00:00:00'), 2: pd.Timestamp('2015-12-15 00:00:00'), 3: pd.Timestamp('2015-12-15 00:00:00')}, 'Time_Begin': {0: pd.Timestamp('2015-10-15 10:00:00'), 1: pd.Timestamp('2015-10-15 12:00:00'), 2: pd.Timestamp('2015-12-15 10:00:00'), 3: pd.Timestamp('2015-12-15 10:00:00')}})
print (df)
   Int_Sec          Time_Begin   Time_End Time_First
0        4 2015-10-15 10:00:00 2015-11-15 2015-10-15
1        2 2015-10-15 12:00:00 2015-10-18 2015-10-15
2        7 2015-12-15 10:00:00 2015-10-17 2015-12-15
3       10 2015-12-15 10:00:00 2015-10-16 2015-12-15

Sec_Var = 20
df['Time_Between'] = df['Time_Begin'] + pd.to_timedelta(df['Int_Sec'], unit='s')
df['Time_Required'] = df['Time_End'] - pd.to_timedelta(Sec_Var, unit='s')
df['Tota_Time'] = ((df['Time_Begin'] - df['Time_First']).dt.total_seconds()) / 60

df['Check'] = np.where(df['Time_Required'] > df['Time_Between'], 0, 1)

print (df)

   Int_Sec          Time_Begin   Time_End Time_First        Time_Between  \
0        4 2015-10-15 10:00:00 2015-11-15 2015-10-15 2015-10-15 10:00:04   
1        2 2015-10-15 12:00:00 2015-10-18 2015-10-15 2015-10-15 12:00:02   
2        7 2015-12-15 10:00:00 2015-10-17 2015-12-15 2015-12-15 10:00:07   
3       10 2015-12-15 10:00:00 2015-10-16 2015-12-15 2015-12-15 10:00:10   

        Time_Required  Tota_Time  Check  
0 2015-11-14 23:59:40      600.0      0  
1 2015-10-17 23:59:40      720.0      0  
2 2015-10-16 23:59:40      600.0      1  
3 2015-10-15 23:59:40      600.0      1  

这篇关于iterrows 的矢量化替代方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆