滑动窗口-测量每个循环窗口上观察值的长度 [英] Sliding windows - measuring length of observations on each looped window
问题描述
让我们分析以下示例代码,其中 zip()用于从数据集中创建不同的窗口并循环返回它们.
Let's analyse this sample code where zip() is used to create different windows from a dataset and return them in loop.
months = [Jan, Feb, Mar, Apr, May]
for x, y in zip(months, months[1:]):
print(x, y)
# Output of each window will be:
Jan Feb
Feb Mar
Mar Apr
Apr May
让我们现在想在每个窗口中使用的月份之间计算各自的长度百分比.
Let's suppose that now I want to calculate the respective length percentage between the months used in each window.
分步示例:
- 返回第一个窗口(二月一月)时,我想计算整个窗口中一月的%长度(等于一月+二月)并返回一个新变量
- 返回第二个窗口(3月2月)时,我想计算整个窗口2月的长度百分比(等于2月+ 3月)并返回一个新变量
- 继续此过程,直到最后一个窗口
欢迎提出任何有关如何在for循环中实现此想法的建议!
Any suggestions on how I might implement this idea in the for loop are welcome!
谢谢!
编辑
months = [Jan, Feb, Mar, Apr, May]
for x, y in zip(months, months[2:]):
print(x, y)
# Output of each window will be:
Jan Feb March
Feb Mar Apr
Mar Apr May
目标是在整个窗口长度上计算每个窗口两个月的长度:
The goal is to calculate the length of two months on each window over the full window length:
- 第一个窗口:1月+ 2月/1月+ 2月+ 3月
- 第二个窗口:2月+ 3月/2月+ 3月+ 4月
- 继续到最后一个窗口
我们现在可以在每个窗口的大小(含start.month)中计算一个月.但是,我们该如何调整以使其包含一个月以上?
We can now calculate one month over the size of each window (with start.month). However, how do we adapt this to include more than one month?
还有,不是使用 days_in_month ,而是可以使用每个月数据点(行)的长度吗?
Also, instead of using days_in_month, would there be a way to use the length of the datapoints (rows) in each month?
通过使用数据点的长度(行),我的意思是每个月都有许多时间"格式(例如60分钟格式)的数据点.这意味着一个月中的1天将有24个不同的数据点(行). 示例:
By using length of datapoints (rows) I mean that each month has many datapoints in 'time' format (e.g., 60 mins format). This would imply that 1 day in a month would have 24 different datapoints (rows). Example:
column
rows
01-Jan-2010 T00:00 value
01-Jan-2010 T01:00 value
01-Jan-2010 T02:00 value
... ...
01-Jan-2010 T24:00 value
02-Jan-2010 T00:00 value
... ...
谢谢!
推荐答案
这是一种方法. (就我而言,months
是period_range
对象.)
Here is one way. (In my case, months
is a period_range
object.)
import pandas as pd
months = pd.period_range(start='2020-01', periods=5, freq='M')
现在,遍历范围.每次迭代都是两个月的窗口.
Now, iterate over range. Each iteration is a two-month window.
# print header labels
print('{:10s} {:10s} {:>10s} {:>10s} {:>10s} {:>10s} '.format(
'start', 'end', 'month', 'front (d)', 'total (d)', 'frac'))
for start, end in zip(months, months[1:]):
front_month = start.month
# number of days in first month (e.g., Jan)
front_month_days = start.days_in_month
# number of days in current sliding window (e.g., Jan + Feb)
days_in_curr_window = (end.end_time - start.start_time).days
frac = front_month_days / days_in_curr_window
print('{:10s} {:10s} {:10d} {:10d} {:10d} {:10.3f}'.format(
str(start), str(end), front_month,
front_month_days, days_in_curr_window, frac))
start end month front (d) total (d) frac
2020-01 2020-02 1 31 60 0.517
2020-02 2020-03 2 29 60 0.483
2020-03 2020-04 3 31 61 0.508
2020-04 2020-05 4 30 61 0.492
这篇关于滑动窗口-测量每个循环窗口上观察值的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!