是否可以根据值在另一列中更改的时间来创建新列? [英] Can I create a new column based on when the value changes in another column?

查看:104
本文介绍了是否可以根据值在另一列中更改的时间来创建新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有这个df

print(df)
              DATE_TIME  A  B
0   10/08/2016 12:04:56  1  5
1   10/08/2016 12:04:58  1  6
2   10/08/2016 12:04:59  2  3
3   10/08/2016 12:05:00  2  2
4   10/08/2016 12:05:01  3  4
5   10/08/2016 12:05:02  3  6
6   10/08/2016 12:05:03  1  3
7   10/08/2016 12:05:04  1  2
8   10/08/2016 12:05:05  2  4
9   10/08/2016 12:05:06  2  6
10  10/08/2016 12:05:07  3  4
11  10/08/2016 12:05:08  3  2

['A']中的值会随时间重复,但是我需要一列,每次更改时它们都会有一个新的ID,这样我将得到类似以下df

The values in column ['A'] repeat over time, I need a column though, where they have a new ID each time they change, so that I would have something like the following df

print(df)
               DATE_TIME  A  B  C
 0   10/08/2016 12:04:56  1  5  1
 1   10/08/2016 12:04:58  1  6  1
 2   10/08/2016 12:04:59  2  3  2
 3   10/08/2016 12:05:00  2  2  2
 4   10/08/2016 12:05:01  3  4  3
 5   10/08/2016 12:05:02  3  6  3
 6   10/08/2016 12:05:03  1  3  4
 7   10/08/2016 12:05:04  1  2  4
 8   10/08/2016 12:05:05  2  4  5
 9   10/08/2016 12:05:06  2  6  5
 10  10/08/2016 12:05:07  3  4  6
 11  10/08/2016 12:05:08  3  2  6

有没有办法用python做到这一点?我对此还很陌生,希望能找到对我大熊猫有帮助的东西,但我还没有发现任何东西.在我的原始数据帧中,列['A']中的值大约每十分钟以不规则的间隔变化,而不是像我的示例那样每两行变化一次.有人知道我该如何完成这项任务吗?谢谢

Is there a way to do this with python? I am still very new to this and hoped to find something that could help me in pandas, but I have not found anything yet. In my original dataframe the values in Column ['A'] change on irregular intervals approximately every ten minutes and not every two rows like in my example. Has anybody an idea how I could approach this task? Thank you

推荐答案

您可以使用 shift-cumsum 模式.

df['C'] = (df.A != df.A.shift()).cumsum()

>>> df
              DATE_TIME  A  B  C
0  10/08/2016  12:04:56  1  5  1
1  10/08/2016  12:04:58  1  6  1
2  10/08/2016  12:04:59  2  3  2
3  10/08/2016  12:05:00  2  2  2
4  10/08/2016  12:05:01  3  4  3
5  10/08/2016  12:05:02  3  6  3
6  10/08/2016  12:05:03  1  3  4
7  10/08/2016  12:05:04  1  2  4
8  10/08/2016  12:05:05  2  4  5
9  10/08/2016  12:05:06  2  6  5
10 10/08/2016  12:05:07  3  4  6
11 10/08/2016  12:05:08  3  2  6

请注意,这是一种流行的分组模式.例如,要获取每个此类组的平均B值:

As a side note, this is a popular pattern for grouping. For example, to get the average B value of each such group:

df.groupby((df.A != df.A.shift()).cumsum()).B.mean()

这篇关于是否可以根据值在另一列中更改的时间来创建新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆