计算符合特定条件的行日时间序列 [英] Calculating time series days-in-a-row that fit certain criteria
问题描述
我有一个时空df:
'date' 'spatial_pixel' 'column_A' ...
---- ----- ---
2012-04-01 | 1000 | 5
2012-04-01 | 1001 | 1
... ... ...
我想要一列(按"spatial_pixel"和"date"分组),该列用于计算布尔值在一行中的天数.说'column_A'<2:
I want a column (groupedby 'spatial_pixel' and 'date') that counts the days-in-a-row a boolean is met. Say 'column_A' < 2:
'date' 'spatial_pixel' 'column_A' 'days-in-a-row' ...
---- ----- --- ----
2012-03-30 | 1001 | 5 | 0
2012-04-01 | 1001 | 1 | 1
2012-04-02 | 1001 | 1 | 2
2012-04-03 | 1001 | 3 | 0
... ... ... ...
我的尝试:
首先,我制作了一个新的数据框,当布尔值为True('column_A'<2)时,将写入每月的天数(例如1,2,3,.... 28,29,30).(但是,我需要将其范围设置为1-365,以便容易将月末和月初标识为连续的).
First, I made a new dataframe that when the boolean is True ('column_A'< 2) the monthly day number (e.g. 1,2,3,....28,29,30) is written. (However, I need it to range from 1-365, so that end of months and beginning of months are easily identified as consecutive).
'date' 'spatial_pixel' 'column_A' 'day' ...
---- ----- --- ----
2012-03-30 | 1001 | 5 | NaN
2012-04-01 | 1001 | 1 | 1
2012-04-02 | 1001 | 1 | 2
2012-04-03 | 1001 | 3 | NaN
2012-04-30 | 1001 | 1 | 30
2012-04-31 | 1001 | 1 | 31
... ... ... ...
第二,
我尝试使用来自@ZJS的修改后的代码来尝试创建新的列来计算连续多少天,但未成功:
I have unsuccessfully tried to create a new column that counts how many consecutive month days, using modified code from @ZJS: Pandas: conditional rolling count.
def rolling_count(val):
if val == rolling_count.previous + 1 :
rolling_count.count +=1
else:
rolling_count.previous = val
rolling_count.count = 1
return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable
df['count'] == df.groupby(['spatial_pixel','date'])['day'].apply(rolling_count)
KeyError: 'count'
任何帮助将不胜感激!
推荐答案
IIUYC,这是我对这个问题的解决方法:
IIUYC, here is my aproach to this problem:
import pandas as pd
from datetime import datetime
df = pd.DataFrame(
[
[datetime(2016, 1, 1), 1000, 5],
[datetime(2016, 1, 1), 1001, 1],
[datetime(2016, 1, 2), 1000, 1],
[datetime(2016, 1, 2), 1001, 1],
[datetime(2016, 1, 3), 1000, 1],
[datetime(2016, 1, 3), 1001, 5],
[datetime(2016, 1, 4), 1000, 1],
[datetime(2016, 1, 4), 1001, 1],
],
columns=['date', 'spatial_pixel', 'column_A']
)
df
# date spatial_pixel column_A
# 0 2016-01-01 1000 5
# 1 2016-01-01 1001 1
# 2 2016-01-02 1000 1
# 3 2016-01-02 1001 1
# 4 2016-01-03 1000 1
# 5 2016-01-03 1001 5
# 6 2016-01-04 1000 1
# 7 2016-01-04 1001 1
def sum_days_in_row_with_condition(g):
sorted_g = g.sort_values(by='date', ascending=True)
condition = sorted_g['column_A'] < 2
sorted_g['days-in-a-row'] = condition.cumsum() - condition.cumsum().where(~condition).ffill().astype(int)
return sorted_g
(df.groupby('spatial_pixel')
.apply(sum_days_in_row_with_condition)
.reset_index(drop=True))
# date spatial_pixel column_A days-in-a-row
# 0 2016-01-01 1000 5 0
# 1 2016-01-02 1000 1 1
# 2 2016-01-03 1000 1 2
# 3 2016-01-04 1000 1 3
# 4 2016-01-01 1001 1 1
# 5 2016-01-02 1001 1 2
# 6 2016-01-03 1001 5 0
# 7 2016-01-04 1001 1 1
这篇关于计算符合特定条件的行日时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!