使用大表循环的python性能问题 [英] python performance problems using loops with big tables
问题描述
我正在使用python和诸如pandas和scipy之类的多个库来准备数据,因此我可以开始更深入的分析.例如,出于准备目的,我将创建两个日期不同的新列.
我的代码提供了预期的结果,但速度确实很慢,因此我无法将其用于具有80K行的表.运行时间大约需要只需80分钟的时间即可完成此表格的操作.
I am using python and multiple libaries like pandas and scipy to prepare data so I can start deeper analysis. For the preparation purpose I am for instance creating new columns with the difference of two dates.
My code is providing the expected results but is really slow so I cannot use it for a table with like 80K rows. The run time would take ca. 80 minutes for the table just for this simple operation.
问题肯定与我的写作操作有关:
The problem is definitely related with my writing operation:
tableContent[6]['p_test_Duration'].iloc[x] = difference
import time
from datetime import date, datetime
tableContent[6]['p_test_Duration'] = 0
#for x in range (0,len(tableContent[6]['p_test_Duration'])):
for x in range (0,1000):
p_test_ZEIT_ANFANG = datetime.strptime(tableContent[6]['p_test_ZEIT_ANFANG'].iloc[x], '%Y-%m-%d %H:%M:%S')
p_test_ZEIT_ENDE = datetime.strptime(tableContent[6]['p_test_ZEIT_ENDE'].iloc[x], '%Y-%m-%d %H:%M:%S')
difference = p_test_ZEIT_ENDE - p_test_ZEIT_ANFANG
tableContent[6]['p_test_Duration'].iloc[x] = difference
正确的结果表:
推荐答案
取消循环,并将函数应用于整个系列.
Take away the loop, and apply the functions to the whole series.
ZEIT_ANFANG = tableContent[6]['p_test_ZEIT_ANFANG'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
ZEIT_ENDE = tableContent[6]['p_test_ZEIT_ENDE'].apply(lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S'))
tableContent[6]['p_test_Duration'] = ZEIT_ENDE - ZEIT_ANFANG
这篇关于使用大表循环的python性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!