比较2个连续的行,并分配增加的值(如果不同)(使用Pandas) [英] Compare 2 consecutive rows and assign increasing value if different (using Pandas)

查看:95
本文介绍了比较2个连续的行,并分配增加的值(如果不同)(使用Pandas)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我像这样有一个数据框df_in:

I have a dataframe df_in like so:

import pandas as pd
dic_in = {'A':['aa','aa','bb','cc','cc','cc','cc','dd','dd','dd','ee'],
       'B':['200','200','200','400','400','500','700','700','900','900','200'],
       'C':['da','cs','fr','fs','se','at','yu','j5','31','ds','sz']}
df_in = pd.DataFrame(dic_in)

我想通过以下方式研究2列A和B. 如果2个连续的rows[['A','B']]相等,则为它们分配一个新值(根据我要描述的特定规则). 我将举一个更清楚的例子:如果第一个row[['A','B']]等于下一个,则设置1;如果第二个等于第三个,那么我将设置1.每次两个连续的行都不相同时,我将值设置为1.

I would like to investigate the 2 columns A and B in the following way. I 2 consecutive rows[['A','B']] are equal then they are assigned a new value (according to a specific rule which i am about to describe). I will give an example to be more clear: If the first row[['A','B']] is equal to the following one, then I set 1; if the second one is equal to the third one then I will set 1. Every time two consecutive rows are different, then I increase the value to set by 1.

结果应如下所示:

     A    B   C  value
0   aa  200  da      1
1   aa  200  cs      1
2   bb  200  fr      2
3   cc  400  fs      3
4   cc  400  se      3
5   cc  500  at      4
6   cc  700  yu      5
7   dd  700  j5      6
8   dd  900  31      7
9   dd  900  ds      7
10  ee  200  sz      8

您能建议我一个聪明的人来实现这一目标吗?

Can you suggest me a smart one to achieve this goal?

推荐答案

使用 shift any 比较连续的行,使用True指示值应在何处更改.然后使用 cumsum 求和获得增加的价值:

Use shift and any to compare consecutive rows, using True to indicate where the value should change. Then take the cumulative sum with cumsum to get the increasing value:

df_in['value'] = (df_in[['A', 'B']] != df_in[['A', 'B']].shift()).any(axis=1)
df_in['value'] = df_in['value'].cumsum()

或者,将其压缩为一行:

Alternatively, condensing it to one line:

df_in['value'] = (df_in[['A', 'B']] != df_in[['A', 'B']].shift()).any(axis=1).cumsum()

结果输出:

     A    B   C  value
0   aa  200  da      1
1   aa  200  cs      1
2   bb  200  fr      2
3   cc  400  fs      3
4   cc  400  se      3
5   cc  500  at      4
6   cc  700  yu      5
7   dd  700  j5      6
8   dd  900  31      7
9   dd  900  ds      7
10  ee  200  sz      8

这篇关于比较2个连续的行,并分配增加的值(如果不同)(使用Pandas)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆