pandas 将函数应用于多列和多行 [英] pandas apply function to multiple columns and multiple rows

查看:40
本文介绍了 pandas 将函数应用于多列和多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其行和列中的连续像素坐标为xpos"、ypos",我想计算连续像素之间每条路径的角度(以度为单位).目前我有下面介绍的解决方案,它工作正常,并且我的文件的大小足够快,但遍历所有行似乎不是熊猫的方法.我知道如何将函数应用于不同的列,以及如何将函数应用于不同的列行,但不知道如何将两者结合起来.

I have a dataframe with consecutive pixel coordinates in rows and columns 'xpos', 'ypos', and I want to calculate the angle in degrees of each path between consecutive pixels. Currently I have the solution presented below, which works fine and for teh size of my file is speedy enough, but iterating through all the rows seems not to be the pandas way to do it. I know how to apply a function to different columns, and how to apply functions to different rows of columns, but can't figure out how to combine both.

这是我的代码:

fix_df = pd.read_csv('fixations_out.csv')

# wyliczanie kąta sakady
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)

然后我将临时列表插入 df

and then I insert temp list into df

在实施评论中的提示后,我有:

after implementing the tip from the comment I have:

df['diff_x'] = df['xpos'].shift() - df['xpos']
df['diff_y'] = df['ypos'].shift() - df['ypos']

def calc_angle(x):
    try:
        a = abs(180/math.pi * math.atan((x.diff_y)/(x.diff_x)))
        return a
    except ZeroDivisionError:
        return 0

df['angle_degrees'] = df.apply(calc_angle, axis=1)

我比较了我的 df 三个解决方案的时间(df 的大小大约是 6k 行),迭代几乎比 apply 慢 9 倍,比不使用 apply 慢大约 1500 倍:

I compared the time of three solutions for my df (the size of the df is about 6k rows), the iteration is almost 9 times slower than apply, and about 1500 times slower then doing it without apply:

带迭代的解决方案的执行时间,包括将新列插入回 df:1,51s

execution time of the solution with iteration, including insert of a new column back to df: 1,51s

没有迭代的解决方案的执行时间,有应用:0.17s

execution time of the solution without iteration, with apply: 0.17s

使用 diff() 的 EdChum 接受答案的执行时间,无需迭代且无需应用:0.001s

execution time of accepted answer by EdChum using diff(), without iteration and without apply: 0.001s

建议:不要使用迭代或应用,始终尝试使用矢量化计算;)它不仅更快,而且可读性更高.

Suggestion: do not use iteration or apply and always try to use vectorized calculation ;) it is not only faster, but also more readable.

推荐答案

您可以通过以下方法完成此操作,我将 Pandas 方式与您的方式进行了比较,速度提高了 1000 倍以上,而且无需重新添加列表作为一个新的专栏!这是在 10000 行数据帧上完成的

You can do this via the following method and I compared the pandas way to your way and it is over 1000 times faster, and that is without adding the list back as a new column! This was done on a 10000 row dataframe

In [108]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].shift() - df['xpos']/df['ypos'].shift() - df['ypos']))

1000 loops, best of 3: 1.27 ms per loop

In [100]:

%%timeit
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)
1 loops, best of 3: 1.29 s per loop

此外,如果可能,请避免使用 apply,因为它是按行操作的,如果您能找到可以处理整个系列或数据帧的矢量化方法,那么总是更喜欢这个.

Also if possible avoid using apply, as this operates row-wise, if you can find a vectorised method that can work on the entire series or dataframe then always prefer this.

更新

看到你只是从前一行做减法,这个diff有内置的方法,这会导致更快的代码:

seeing as you are just doing a subtraction from the previous row there is built in method for this diff this results in even faster code:

In [117]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1)/df['ypos'].diff(1)))

1000 loops, best of 3: 1.01 ms per loop

另一个更新

还有一个用于系列和数据帧划分的内置方法,这现在可以节省更多时间,我实现了低于 1 毫秒的时间:

There is also a build in method for series and dataframe division, this now shaves more time off and I achieve sub 1ms time:

In [9]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1).div(df['ypos'].diff(1))))

1000 loops, best of 3: 951 µs per loop

这篇关于 pandas 将函数应用于多列和多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆