pandas 将函数应用于多列和多行 [英] pandas apply function to multiple columns and multiple rows
问题描述
我有一个数据框,其行和列中的连续像素坐标为xpos"、ypos",我想计算连续像素之间每条路径的角度(以度为单位).目前我有下面介绍的解决方案,它工作正常,并且我的文件的大小足够快,但遍历所有行似乎不是熊猫的方法.我知道如何将函数应用于不同的列,以及如何将函数应用于不同的列行,但不知道如何将两者结合起来.
I have a dataframe with consecutive pixel coordinates in rows and columns 'xpos', 'ypos', and I want to calculate the angle in degrees of each path between consecutive pixels. Currently I have the solution presented below, which works fine and for teh size of my file is speedy enough, but iterating through all the rows seems not to be the pandas way to do it. I know how to apply a function to different columns, and how to apply functions to different rows of columns, but can't figure out how to combine both.
这是我的代码:
fix_df = pd.read_csv('fixations_out.csv')
# wyliczanie kąta sakady
temp_list=[]
for count, row in df.iterrows():
x1 = row['xpos']
y1 = row['ypos']
try:
x2 = df['xpos'].ix[count-1]
y2 = df['ypos'].ix[count-1]
a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
temp_list.append(a)
except KeyError:
temp_list.append(np.nan)
然后我将临时列表插入 df
and then I insert temp list into df
在实施评论中的提示后,我有:
after implementing the tip from the comment I have:
df['diff_x'] = df['xpos'].shift() - df['xpos']
df['diff_y'] = df['ypos'].shift() - df['ypos']
def calc_angle(x):
try:
a = abs(180/math.pi * math.atan((x.diff_y)/(x.diff_x)))
return a
except ZeroDivisionError:
return 0
df['angle_degrees'] = df.apply(calc_angle, axis=1)
我比较了我的 df 三个解决方案的时间(df 的大小大约是 6k 行),迭代几乎比 apply 慢 9 倍,比不使用 apply 慢大约 1500 倍:
I compared the time of three solutions for my df (the size of the df is about 6k rows), the iteration is almost 9 times slower than apply, and about 1500 times slower then doing it without apply:
带迭代的解决方案的执行时间,包括将新列插入回 df:1,51s
execution time of the solution with iteration, including insert of a new column back to df: 1,51s
没有迭代的解决方案的执行时间,有应用:0.17s
execution time of the solution without iteration, with apply: 0.17s
使用 diff() 的 EdChum 接受答案的执行时间,无需迭代且无需应用:0.001s
execution time of accepted answer by EdChum using diff(), without iteration and without apply: 0.001s
建议:不要使用迭代或应用,始终尝试使用矢量化计算;)它不仅更快,而且可读性更高.
Suggestion: do not use iteration or apply and always try to use vectorized calculation ;) it is not only faster, but also more readable.
推荐答案
您可以通过以下方法完成此操作,我将 Pandas 方式与您的方式进行了比较,速度提高了 1000 倍以上,而且无需重新添加列表作为一个新的专栏!这是在 10000 行数据帧上完成的
You can do this via the following method and I compared the pandas way to your way and it is over 1000 times faster, and that is without adding the list back as a new column! This was done on a 10000 row dataframe
In [108]:
%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].shift() - df['xpos']/df['ypos'].shift() - df['ypos']))
1000 loops, best of 3: 1.27 ms per loop
In [100]:
%%timeit
temp_list=[]
for count, row in df.iterrows():
x1 = row['xpos']
y1 = row['ypos']
try:
x2 = df['xpos'].ix[count-1]
y2 = df['ypos'].ix[count-1]
a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
temp_list.append(a)
except KeyError:
temp_list.append(np.nan)
1 loops, best of 3: 1.29 s per loop
此外,如果可能,请避免使用 apply
,因为它是按行操作的,如果您能找到可以处理整个系列或数据帧的矢量化方法,那么总是更喜欢这个.
Also if possible avoid using apply
, as this operates row-wise, if you can find a vectorised method that can work on the entire series or dataframe then always prefer this.
更新
看到你只是从前一行做减法,这个diff
有内置的方法,这会导致更快的代码:
seeing as you are just doing a subtraction from the previous row there is built in method for this diff
this results in even faster code:
In [117]:
%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1)/df['ypos'].diff(1)))
1000 loops, best of 3: 1.01 ms per loop
另一个更新
还有一个用于系列和数据帧划分的内置方法,这现在可以节省更多时间,我实现了低于 1 毫秒的时间:
There is also a build in method for series and dataframe division, this now shaves more time off and I achieve sub 1ms time:
In [9]:
%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1).div(df['ypos'].diff(1))))
1000 loops, best of 3: 951 µs per loop
这篇关于 pandas 将函数应用于多列和多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!