如何使用python在 pandas 数据框中有效地遍历行 [英] How to iterate over rows effectively in pandas data-frame using python

查看:99
本文介绍了如何使用python在 pandas 数据框中有效地遍历行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据帧:

  ABC 
13.06 12.95 -0.11
92.56 104.63 12.07
116.49 219.27 102.78
272.11 487.26 215.15
300.11 780.75 480.64

大约有100万条记录。



我要创建一个列D,其计算公式如下:



列<$ c的第一个值$ c> D 将为0,然后:



颜色D3 = =(D2 + 1)* C3 / B3



颜色D4 = =(D3 + 1)* C4 / B4



D列的当前值取决于先前的值。 / p>

结果如下:

  D 
0
0.115358884
0.52281017
0.672397915
1.02955022

我可以使用 for循环和loc 解决它,但是要花很多时间。我可以用更有效的pythonic方法解决它吗?

解决方案

递归计算不可矢量化,因为使用了改进的性能 numba

 从numba import jit 

@jit(nopython = True)
def f(a,b ,c):
d = np.empty(a.shape)
d [0] = 0
对于i在range(1,a.shape [0]):
d [ i] =(d [i-1] + 1)* c [i] / b [i]
return d

df ['D'] = f(df ['A '] .to_numpy(),df ['B']。to_numpy(),df ['C']。to_numpy())
打印(df)
ABCD
0 13.06 12.95- 0.11 0.000000
1 92.56 104.63 12.07 0.115359
2 116.49 219.27 102.78 0.522810
3 272.11 487.26 215.15 0.672398
4300.11 780.75 480.64 1.029550


I have a data-frame which looks like:

A         B       C
13.06   12.95   -0.11
92.56   104.63  12.07
116.49  219.27  102.78
272.11  487.26  215.15
300.11  780.75  480.64

There are like 1 million records.

I want to create a column D which is calcualted as below:

First value of column D will be 0 and then:

Col D3= =(D2+1)*C3/B3

Col D4= =(D3+1)*C4/B4

Column D present value depends on previous value.

Here is the result:

D
0
0.115358884
0.52281017
0.672397915
1.02955022

I can solve it using for loop and loc but its taking lot of time. Can I solve it in more effective pythonic way?

解决方案

Recursive calculations are not vectorisable, for improve performance is used numba:

from numba import jit

@jit(nopython=True)
def f(a, b, c):
    d = np.empty(a.shape)
    d[0] = 0
    for i in range(1, a.shape[0]):
        d[i] = (d[i-1] + 1) * c[i] / b[i]
    return d

df['D'] = f(df['A'].to_numpy(), df['B'].to_numpy(), df['C'].to_numpy())
print (df)
        A       B       C         D
0   13.06   12.95   -0.11  0.000000
1   92.56  104.63   12.07  0.115359
2  116.49  219.27  102.78  0.522810
3  272.11  487.26  215.15  0.672398
4  300.11  780.75  480.64  1.029550

这篇关于如何使用python在 pandas 数据框中有效地遍历行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆