在 pandas 数据框中加速iloc解决方案 [英] Speeding up an iloc solution within a pandas dataframe

查看：53 发布时间：2020/10/17 0:14:23 python pandas dataframe

本文介绍了在 pandas 数据框中加速iloc解决方案的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下 DataFrame ：

dates = pd.date_range('20150101', periods=4)
df = pd.DataFrame({'A' : [5,10,3,4]}, index = dates)

df.loc[:,'B'] = 0
df.loc[:,'C'] = 0
df.iloc[0,1]  = 10
df.iloc[0,2]  = 3

print df

Out[69]:

             A   B  C
2015-01-01   5  10  3
2015-01-02  10   0  0
2015-01-03   3   0  0
2015-01-04   4   0  0

我想对列 B 和 C ：


 
   B（k + 1）= B（k）-A（k + 1） 
 
   C（k + 1）= B（k）+ A（k + 1） 
 
 

B(k+1) = B(k) - A(k+1)
C(k+1) = B(k) + A(k+1)

我可以使用以下代码执行此操作：
I can do this using the following code:
for i in range (1, df.shape[0]): 
        df.iloc[i,1] = df.iloc[i-1,1] - df.iloc[i,0] 
        df.iloc[i,2] = df.iloc[i-1,1] + df.iloc[i,0] 
print df

此给出：
             A   B   C
2015-01-01   5  10   3
2015-01-02  10   0  20
2015-01-03   3  -3   3
2015-01-04   4  -7   1

我正在寻找的答案。问题是，当我将其应用于具有大型数据集的 DataFrame 时，它运行缓慢。非常慢。有没有更好的方法来实现这一目标？ 
Which is the answer I'm looking for. The problem is when I apply this to a DataFrame with a large dataset it runs slow. Very slow. Is there a better way of achieving this? 
推荐答案
这样的递归方法很难向量化。   numba  通常可以很好地处理它们-如果您需要重新分配您的代码  cython  可能是更好的选择，因为它会产生常规
Recursive things like this can be hard to vectorize.  numba usually handles them well - if you need to redistribute your code, cython may be a better choice as it produces regular c-extensions with no extra dependencies.
In [88]: import numba

In [89]: @numba.jit(nopython=True)
    ...: def logic(a, b, c):
    ...:     N = len(a)
    ...:     out = np.zeros((N, 2), dtype=np.int64)
    ...:     for i in range(N):
    ...:         if i == 0:
    ...:             out[i, 0] = b[i]
    ...:             out[i, 1] = c[i]
    ...:         else:
    ...:             out[i, 0] = out[i-1,0] - a[i]
    ...:             out[i, 1] = out[i-1,0] + a[i]
    ...:     return out

In [90]: logic(df.A.values, df.B.values, df.C.values)
Out[90]: 
array([[10,  3],
       [ 0, 20],
       [-3,  3],
       [-7,  1]], dtype=int64)

In [91]: df[['A','B']] = logic(df.A.values, df.B.values, df.C.values)

编辑：
如其他答案所示，此问题实际上可能是向量化，您可能应该使用。

As shown in the other answers, this problem can actually be vectorized, which you should probably use.

                        这篇关于在 pandas 数据框中加速iloc解决方案的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在 pandas 数据框中加速iloc解决方案 [英] Speeding up an iloc solution within a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 pandas 数据框中加速iloc解决方案 [英] Speeding up an iloc solution within a pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭