如何使用python在 pandas 数据框中有效地遍历行 [英] How to iterate over rows effectively in pandas data-frame using python
问题描述
我有一个看起来像这样的数据帧:
ABC
13.06 12.95 -0.11
92.56 104.63 12.07
116.49 219.27 102.78
272.11 487.26 215.15
300.11 780.75 480.64
大约有100万条记录。
我要创建一个列D,其计算公式如下:
列<$ c的第一个值$ c> D 将为0,然后:
颜色D3
= =(D2 + 1)* C3 / B3
颜色D4
= =(D3 + 1)* C4 / B4
D列的当前值取决于先前的值。 / p>
结果如下:
D
0
0.115358884
0.52281017
0.672397915
1.02955022
我可以使用 for循环和loc
解决它,但是要花很多时间。我可以用更有效的pythonic方法解决它吗?
递归计算不可矢量化,因为使用了改进的性能 numba :
从numba import jit
@jit(nopython = True)
def f(a,b ,c):
d = np.empty(a.shape)
d [0] = 0
对于i在range(1,a.shape [0]):
d [ i] =(d [i-1] + 1)* c [i] / b [i]
return d
df ['D'] = f(df ['A '] .to_numpy(),df ['B']。to_numpy(),df ['C']。to_numpy())
打印(df)
ABCD
0 13.06 12.95- 0.11 0.000000
1 92.56 104.63 12.07 0.115359
2 116.49 219.27 102.78 0.522810
3 272.11 487.26 215.15 0.672398
4300.11 780.75 480.64 1.029550
I have a data-frame which looks like:
A B C
13.06 12.95 -0.11
92.56 104.63 12.07
116.49 219.27 102.78
272.11 487.26 215.15
300.11 780.75 480.64
There are like 1 million records.
I want to create a column D which is calcualted as below:
First value of column D
will be 0 and then:
Col D3
= =(D2+1)*C3/B3
Col D4
= =(D3+1)*C4/B4
Column D present value depends on previous value.
Here is the result:
D
0
0.115358884
0.52281017
0.672397915
1.02955022
I can solve it using for loop and loc
but its taking lot of time. Can I solve it in more effective pythonic way?
Recursive calculations are not vectorisable, for improve performance is used numba:
from numba import jit
@jit(nopython=True)
def f(a, b, c):
d = np.empty(a.shape)
d[0] = 0
for i in range(1, a.shape[0]):
d[i] = (d[i-1] + 1) * c[i] / b[i]
return d
df['D'] = f(df['A'].to_numpy(), df['B'].to_numpy(), df['C'].to_numpy())
print (df)
A B C D
0 13.06 12.95 -0.11 0.000000
1 92.56 104.63 12.07 0.115359
2 116.49 219.27 102.78 0.522810
3 272.11 487.26 215.15 0.672398
4 300.11 780.75 480.64 1.029550
这篇关于如何使用python在 pandas 数据框中有效地遍历行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!