在 pandas 数据框中加速iloc解决方案 [英] Speeding up an iloc solution within a pandas dataframe

查看:53
本文介绍了在 pandas 数据框中加速iloc解决方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 DataFrame

dates = pd.date_range('20150101', periods=4)
df = pd.DataFrame({'A' : [5,10,3,4]}, index = dates)

df.loc[:,'B'] = 0
df.loc[:,'C'] = 0
df.iloc[0,1]  = 10
df.iloc[0,2]  = 3

print df

Out[69]:

             A   B  C
2015-01-01   5  10  3
2015-01-02  10   0  0
2015-01-03   3   0  0
2015-01-04   4   0  0

我想对列 B C


  • B(k + 1)= B(k)-A(k + 1)

  • C(k + 1)= B(k)+ A(k + 1)

  • B(k+1) = B(k) - A(k+1)
  • C(k+1) = B(k) + A(k+1)

我可以使用以下代码执行此操作:

I can do this using the following code:

for i in range (1, df.shape[0]): 
        df.iloc[i,1] = df.iloc[i-1,1] - df.iloc[i,0] 
        df.iloc[i,2] = df.iloc[i-1,1] + df.iloc[i,0] 
print df

此给出:

             A   B   C
2015-01-01   5  10   3
2015-01-02  10   0  20
2015-01-03   3  -3   3
2015-01-04   4  -7   1

我正在寻找的答案。问题是,当我将其应用于具有大型数据集的 DataFrame 时,它运行缓慢。非常慢。有没有更好的方法来实现这一目标?

Which is the answer I'm looking for. The problem is when I apply this to a DataFrame with a large dataset it runs slow. Very slow. Is there a better way of achieving this?

推荐答案

这样的递归方法很难向量化。 numba 通常可以很好地处理它们-如果您需要重新分配您的代码 cython 可能是更好的选择,因为它会产生常规

Recursive things like this can be hard to vectorize. numba usually handles them well - if you need to redistribute your code, cython may be a better choice as it produces regular c-extensions with no extra dependencies.

In [88]: import numba

In [89]: @numba.jit(nopython=True)
    ...: def logic(a, b, c):
    ...:     N = len(a)
    ...:     out = np.zeros((N, 2), dtype=np.int64)
    ...:     for i in range(N):
    ...:         if i == 0:
    ...:             out[i, 0] = b[i]
    ...:             out[i, 1] = c[i]
    ...:         else:
    ...:             out[i, 0] = out[i-1,0] - a[i]
    ...:             out[i, 1] = out[i-1,0] + a[i]
    ...:     return out

In [90]: logic(df.A.values, df.B.values, df.C.values)
Out[90]: 
array([[10,  3],
       [ 0, 20],
       [-3,  3],
       [-7,  1]], dtype=int64)

In [91]: df[['A','B']] = logic(df.A.values, df.B.values, df.C.values)

编辑:
如其他答案所示,此问题实际上可能是向量化,您可能应该使用。

As shown in the other answers, this problem can actually be vectorized, which you should probably use.

这篇关于在 pandas 数据框中加速iloc解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆