使用Numpy和Pandas优化Python代码 [英] Optimization of the Python code using Numpy and Pandas
本文介绍了使用Numpy和Pandas优化Python代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下代码在工作:
import numpy as np
import pandas as pd
colum1 = [0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
colum2 = [1,2,3,4,5,6,7,8,9,10,11,12]
colum3 = [0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
colum4 = [1743.85, 1485.58, 1250.07, 1021.83, 818.96, 628.05, 455.40, 319.03, 190.86 , 97.07, 26.96 , 0.00]
df = pd.DataFrame({
'colum1' : colum1,
'colum2' : colum2,
'colum3' : colum3,
'colum4' : colum4,
});
df['result'] = 0
for i in range(len(colum2)):
df['result'] = np.where(
df['colum2'] <= 5,
np.where(
df['colum2'] == 1,
df['colum4'],
np.where(
( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) )>0,
( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) ),
0
)
),
np.where(
( df['colum4'] - (df['result'].shift(1) * df['colum1']) )>0,
( df['colum4'] - (df['result'].shift(1) * df['colum1']) ),
0
)
)
,我需要执行相同的操作,而不必求助于for循环. 这将非常有帮助,因为我正在处理成千上万条记录,这非常慢.
and I need to perform the same operation without resorting to a for cycle. This would be very helpful, since I am working with thousands of records, which is very slow.
我的预期结果如下:
colum1 colum2 colum3 colum4 result
0 0.05 1 0.85 1743.85 1743.850000
1 0.05 2 0.80 1485.58 1415.826000
2 0.05 3 0.80 1250.07 1193.436960
3 0.05 4 0.80 1021.83 974.092522
4 0.05 5 0.85 818.96 777.561068
5 0.05 6 0.00 628.05 589.171947
6 0.05 7 0.00 455.40 425.941403
7 0.05 8 0.00 319.03 297.732930
8 0.05 9 0.00 190.86 175.973354
9 0.05 10 0.00 97.07 88.271332
10 0.05 11 0.00 26.96 22.546433
11 0.05 12 0.00 0.00 0.000000
推荐答案
The first step is to remove the loop over the index and replace those tests for numbers greater than 0 with np.maximum
. This works because np.where(a > 0, a, 0)
for our purposes is equivalent to np.maximum(0, a)
.
同时分别定义较长的表达式以使代码可读:
At the same time define the longer expressions separately to make your code readable:
s1 = df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3']))
s2 = df['colum4'] - (df['result'].shift(1) * df['colum1'])
df['result'] = np.where(df['colum2'] <= 5,
np.where(df['colum2'] == 1, df['colum4'],
np.maximum(0, s1)),
np.maximum(0, s2))
下一步是使用 np.select
删除嵌套的np.where
语句:
m1 = df['colum2'] <= 5
m2 = df['colum2'] == 1
conds = [m1 & m2, m1 & ~m2]
choices = [df['colum4'], np.maximum(0, s1)]
df['result'] = np.select(conds, choices, np.maximum(0, s2))
此版本将更易于管理.
这篇关于使用Numpy和Pandas优化Python代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文