使用大表循环的python性能问题 [英] python performance problems using loops with big tables

查看:109
本文介绍了使用大表循环的python性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python和诸如pandas和scipy之类的多个库来准备数据,因此我可以开始更深入的分析.例如,出于准备目的,我将创建两个日期不同的新列.
我的代码提供了预期的结果,但速度确实很慢,因此我无法将其用于具有80K行的表.运行时间大约需要只需80分钟的时间即可完成此表格的操作.

I am using python and multiple libaries like pandas and scipy to prepare data so I can start deeper analysis. For the preparation purpose I am for instance creating new columns with the difference of two dates.
My code is providing the expected results but is really slow so I cannot use it for a table with like 80K rows. The run time would take ca. 80 minutes for the table just for this simple operation.

问题肯定与我的写作操作有关:

The problem is definitely related with my writing operation:

tableContent[6]['p_test_Duration'].iloc[x] = difference


import time
from datetime import date, datetime

tableContent[6]['p_test_Duration'] = 0

#for x in range (0,len(tableContent[6]['p_test_Duration'])):
for x in range (0,1000):
    p_test_ZEIT_ANFANG = datetime.strptime(tableContent[6]['p_test_ZEIT_ANFANG'].iloc[x], '%Y-%m-%d %H:%M:%S')
    p_test_ZEIT_ENDE = datetime.strptime(tableContent[6]['p_test_ZEIT_ENDE'].iloc[x], '%Y-%m-%d %H:%M:%S')
    difference = p_test_ZEIT_ENDE - p_test_ZEIT_ANFANG

    tableContent[6]['p_test_Duration'].iloc[x] = difference

正确的结果表:

推荐答案

取消循环,并将函数应用于整个系列.

Take away the loop, and apply the functions to the whole series.

ZEIT_ANFANG = tableContent[6]['p_test_ZEIT_ANFANG'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
ZEIT_ENDE = tableContent[6]['p_test_ZEIT_ENDE'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
tableContent[6]['p_test_Duration'] = ZEIT_ENDE - ZEIT_ANFANG

这篇关于使用大表循环的python性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆