滑动窗口-测量每个循环窗口上观察值的长度 [英] Sliding windows - measuring length of observations on each looped window

查看:149
本文介绍了滑动窗口-测量每个循环窗口上观察值的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们分析以下示例代码,其中 zip()用于从数据集中创建不同的窗口并循环返回它们.

Let's analyse this sample code where zip() is used to create different windows from a dataset and return them in loop.

months = [Jan, Feb, Mar, Apr, May]

for x, y in zip(months, months[1:]):
    print(x, y)

# Output of each window will be:
Jan Feb 
Feb Mar
Mar Apr
Apr May

让我们现在想在每个窗口中使用的月份之间计算各自的长度百分比.

Let's suppose that now I want to calculate the respective length percentage between the months used in each window.

分步示例:

  1. 返回第一个窗口(二月一月)时,我想计算整个窗口中一月的%长度(等于一月+二月)并返回一个新变量
  2. 返回第二个窗口(3月2月)时,我想计算整个窗口2月的长度百分比(等于2月+ 3月)并返回一个新变量
  3. 继续此过程,直到最后一个窗口

欢迎提出任何有关如何在for循环中实现此想法的建议!

Any suggestions on how I might implement this idea in the for loop are welcome!

谢谢!

编辑

months = [Jan, Feb, Mar, Apr, May]

for x, y in zip(months, months[2:]):
    print(x, y)

# Output of each window will be:
Jan Feb March
Feb Mar Apr
Mar Apr May

目标是在整个窗口长度上计算每个窗口两个月的长度:

The goal is to calculate the length of two months on each window over the full window length:

  • 第一个窗口:1月+ 2月/1月+ 2月+ 3月
  • 第二个窗口:2月+ 3月/2月+ 3月+ 4月
  • 继续到最后一个窗口

我们现在可以在每个窗口的大小(含start.month)中计算一个月.但是,我们该如何调整以使其包含一个月以上?

We can now calculate one month over the size of each window (with start.month). However, how do we adapt this to include more than one month?

还有,不是使用 days_in_month ,而是可以使用每个月数据点(行)的长度吗?

Also, instead of using days_in_month, would there be a way to use the length of the datapoints (rows) in each month?

通过使用数据点的长度(行),我的意思是每个月都有许多时间"格式(例如60分钟格式)的数据点.这意味着一个月中的1天将有24个不同的数据点(行). 示例:

By using length of datapoints (rows) I mean that each month has many datapoints in 'time' format (e.g., 60 mins format). This would imply that 1 day in a month would have 24 different datapoints (rows). Example:

                         column
rows             
01-Jan-2010 T00:00        value
01-Jan-2010 T01:00        value
01-Jan-2010 T02:00        value
...                       ...
01-Jan-2010 T24:00        value
02-Jan-2010 T00:00        value
...                       ...

谢谢!

推荐答案

这是一种方法. (就我而言,monthsperiod_range对象.)

Here is one way. (In my case, months is a period_range object.)

import pandas as pd
months = pd.period_range(start='2020-01', periods=5, freq='M')

现在,遍历范围.每次迭代都是两个月的窗口.

Now, iterate over range. Each iteration is a two-month window.

# print header labels
print('{:10s} {:10s} {:>10s} {:>10s} {:>10s} {:>10s} '.format(
    'start', 'end', 'month', 'front (d)', 'total (d)', 'frac'))

for start, end in zip(months, months[1:]):
    front_month = start.month

    # number of days in first month (e.g., Jan)
    front_month_days = start.days_in_month

    # number of days in current sliding window (e.g., Jan + Feb)
    days_in_curr_window = (end.end_time - start.start_time).days

    frac = front_month_days / days_in_curr_window

    print('{:10s} {:10s} {:10d} {:10d} {:10d} {:10.3f}'.format(
        str(start), str(end), front_month,
        front_month_days, days_in_curr_window, frac))


start      end             month  front (d)  total (d)       frac 
2020-01    2020-02             1         31         60      0.517
2020-02    2020-03             2         29         60      0.483
2020-03    2020-04             3         31         61      0.508
2020-04    2020-05             4         30         61      0.492

这篇关于滑动窗口-测量每个循环窗口上观察值的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆