pandas 内部apply()函数的计数 [英] Counting within Pandas apply() function
问题描述
我正在尝试遍历DataFrame,当值更改时,增加一个计数器,然后设置一个等于该值的新列.我可以使用全局计数器使它正常工作,就像这样:
I'm trying to iterate through a DataFrame and when a value changes, increment a counter, then set a new column equal to that value. I'm able to get this to work using a global counter, like so:
def change_ind(row):
global prev_row
global k
if row['rep'] != prev_row:
k = k+1
prev_row = row['rep']
return k
但是当我尝试将参数传递给apply函数时,如下所示,它不再起作用.好像它在每次对新行进行操作时都在重置k的值prev_row.有没有一种方法可以将参数传递给函数并获得所需的结果?还是完全可以做到这一点的更好方法?
But when I try to pass arguments to the apply function, as below, it no longer works. It seems like it is resetting the values of k, prev_row each time it operates on a new row. Is there a way to pass arguments to the function and get the result I'm looking for? Or a better way to do this altogether?
def change_ind(row, k, prev_row):
if row != prev_row:
k = k+1
prev_row = row
return k
推荐答案
您可以使用shift
和cumsum
实现相同的操作,这比循环要快得多:
You can achieve the same thing using shift
and cumsum
this will be significantly faster than looping:
In [107]:
df = pd.DataFrame({'rep':[0,1,1,1,2,3,2,3,4,5,1]})
df
Out[107]:
rep
0 0
1 1
2 1
3 1
4 2
5 3
6 2
7 3
8 4
9 5
10 1
In [108]:
df['rep_f'] = (df['rep']!=df['rep'].shift()).cumsum()-1
df
Out[108]:
rep rep_f
0 0 0
1 1 1
2 1 1
3 1 1
4 2 2
5 3 3
6 2 4
7 3 5
8 4 6
9 5 7
10 1 8
这篇关于 pandas 内部apply()函数的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!