是否可以根据值在另一列中更改的时间来创建新列? [英] Can I create a new column based on when the value changes in another column?
问题描述
假设我有这个df
print(df)
DATE_TIME A B
0 10/08/2016 12:04:56 1 5
1 10/08/2016 12:04:58 1 6
2 10/08/2016 12:04:59 2 3
3 10/08/2016 12:05:00 2 2
4 10/08/2016 12:05:01 3 4
5 10/08/2016 12:05:02 3 6
6 10/08/2016 12:05:03 1 3
7 10/08/2016 12:05:04 1 2
8 10/08/2016 12:05:05 2 4
9 10/08/2016 12:05:06 2 6
10 10/08/2016 12:05:07 3 4
11 10/08/2016 12:05:08 3 2
列['A']
中的值会随时间重复,但是我需要一列,每次更改时它们都会有一个新的ID,这样我将得到类似以下df
The values in column ['A']
repeat over time, I need a column though, where they have a new ID each time they change, so that I would have something like the following df
print(df)
DATE_TIME A B C
0 10/08/2016 12:04:56 1 5 1
1 10/08/2016 12:04:58 1 6 1
2 10/08/2016 12:04:59 2 3 2
3 10/08/2016 12:05:00 2 2 2
4 10/08/2016 12:05:01 3 4 3
5 10/08/2016 12:05:02 3 6 3
6 10/08/2016 12:05:03 1 3 4
7 10/08/2016 12:05:04 1 2 4
8 10/08/2016 12:05:05 2 4 5
9 10/08/2016 12:05:06 2 6 5
10 10/08/2016 12:05:07 3 4 6
11 10/08/2016 12:05:08 3 2 6
有没有办法用python做到这一点?我对此还很陌生,希望能找到对我大熊猫有帮助的东西,但我还没有发现任何东西.在我的原始数据帧中,列['A']
中的值大约每十分钟以不规则的间隔变化,而不是像我的示例那样每两行变化一次.有人知道我该如何完成这项任务吗?谢谢
Is there a way to do this with python? I am still very new to this and hoped to find something that could help me in pandas, but I have not found anything yet. In my original dataframe the values in Column ['A']
change on irregular intervals approximately every ten minutes and not every two rows like in my example. Has anybody an idea how I could approach this task? Thank you
推荐答案
您可以使用 shift-cumsum 模式.
df['C'] = (df.A != df.A.shift()).cumsum()
>>> df
DATE_TIME A B C
0 10/08/2016 12:04:56 1 5 1
1 10/08/2016 12:04:58 1 6 1
2 10/08/2016 12:04:59 2 3 2
3 10/08/2016 12:05:00 2 2 2
4 10/08/2016 12:05:01 3 4 3
5 10/08/2016 12:05:02 3 6 3
6 10/08/2016 12:05:03 1 3 4
7 10/08/2016 12:05:04 1 2 4
8 10/08/2016 12:05:05 2 4 5
9 10/08/2016 12:05:06 2 6 5
10 10/08/2016 12:05:07 3 4 6
11 10/08/2016 12:05:08 3 2 6
请注意,这是一种流行的分组模式.例如,要获取每个此类组的平均B
值:
As a side note, this is a popular pattern for grouping. For example, to get the average B
value of each such group:
df.groupby((df.A != df.A.shift()).cumsum()).B.mean()
这篇关于是否可以根据值在另一列中更改的时间来创建新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!