在 pandas 数据框中加速iloc解决方案 [英] Speeding up an iloc solution within a pandas dataframe
问题描述
我有以下 DataFrame
:
dates = pd.date_range('20150101', periods=4)
df = pd.DataFrame({'A' : [5,10,3,4]}, index = dates)
df.loc[:,'B'] = 0
df.loc[:,'C'] = 0
df.iloc[0,1] = 10
df.iloc[0,2] = 3
print df
Out[69]:
A B C
2015-01-01 5 10 3
2015-01-02 10 0 0
2015-01-03 3 0 0
2015-01-04 4 0 0
我想对列 B
和 C $ c $实施以下逻辑c>:
-
B(k + 1)= B(k)-A(k + 1)
-
C(k + 1)= B(k)+ A(k + 1)
B(k+1) = B(k) - A(k+1)
C(k+1) = B(k) + A(k+1)
我可以使用以下代码执行此操作:
I can do this using the following code:
for i in range (1, df.shape[0]):
df.iloc[i,1] = df.iloc[i-1,1] - df.iloc[i,0]
df.iloc[i,2] = df.iloc[i-1,1] + df.iloc[i,0]
print df
此给出:
A B C
2015-01-01 5 10 3
2015-01-02 10 0 20
2015-01-03 3 -3 3
2015-01-04 4 -7 1
我正在寻找的答案。问题是,当我将其应用于具有大型数据集的 DataFrame
时,它运行缓慢。非常慢。有没有更好的方法来实现这一目标?
Which is the answer I'm looking for. The problem is when I apply this to a DataFrame
with a large dataset it runs slow. Very slow. Is there a better way of achieving this?
推荐答案
这样的递归方法很难向量化。 numba
通常可以很好地处理它们-如果您需要重新分配您的代码 cython
可能是更好的选择,因为它会产生常规
Recursive things like this can be hard to vectorize. numba
usually handles them well - if you need to redistribute your code, cython
may be a better choice as it produces regular c-extensions with no extra dependencies.
In [88]: import numba
In [89]: @numba.jit(nopython=True)
...: def logic(a, b, c):
...: N = len(a)
...: out = np.zeros((N, 2), dtype=np.int64)
...: for i in range(N):
...: if i == 0:
...: out[i, 0] = b[i]
...: out[i, 1] = c[i]
...: else:
...: out[i, 0] = out[i-1,0] - a[i]
...: out[i, 1] = out[i-1,0] + a[i]
...: return out
In [90]: logic(df.A.values, df.B.values, df.C.values)
Out[90]:
array([[10, 3],
[ 0, 20],
[-3, 3],
[-7, 1]], dtype=int64)
In [91]: df[['A','B']] = logic(df.A.values, df.B.values, df.C.values)
编辑:
如其他答案所示,此问题实际上可能是向量化,您可能应该使用。
As shown in the other answers, this problem can actually be vectorized, which you should probably use.
这篇关于在 pandas 数据框中加速iloc解决方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!