在滑动/平铺窗口上应用时间序列分解(和异常检测) [英] Apply timeseries decomposition (and anomaly detection) over a sliding/tiled window

查看:140
本文介绍了在滑动/平铺窗口上应用时间序列分解(和异常检测)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

twitter 发布并现已放弃的异常检测方法已分别分叉并保留在 anomalize程序包

Anomaly detection methods published and now abandoned by twitter have been separately forked and maintained in the anomalize package and the hrbrmstr/AnomalyDetection fork. Both have implemented features that are 'tidy'.

工作静态版本

tidyverse_cran_downloads %>% 
  filter(package == "tidyr") %>% 
  ungroup() %>% 
  select(-package) -> one_package_only

one_package_only %>% 
  anomalize::time_decompose(count,
                 merge = TRUE,
                 method = "twitter",
                 frequency = "7 days") -> one_package_only_decomp

one_package_only_decomp %>%
  anomalize::anomalize(remainder, method = "iqr") %>%
  anomalize::time_recompose()


one_package_only_decomp %>% 
  select(date, remainder) %>%
  AnomalyDetection::ad_ts(max_anoms = 0.02,
        direction = 'both')

这些工作符合预期.

我想将平铺窗口上的twitter异常检测过程应用于我的数据集,其结构与anomalize::tidyverse_cran_downloads数据集相似.定期对值进行100多个观察,并按类别定义分组.

I would like to apply the twitter anomaly detection process on a tiled window to my dataset, which is similar in structure to the anomalize::tidyverse_cran_downloads dataset. A regular set of over 100 observations of a value, grouped by a categorical definition.

tsibble软件包(代替了旧的tibbletime)具有通过

The tsibble package (which replaces the old tibbletime) has a method to apply a function in a purrr-like syntax via slide,tile and stretch. This can include returning a full data-frame like object, inside another data-frame like object as per purrr. (What a sentence!)

我已经遍历了窗口函数小插图,但没有太多运气.

I've gone through the window function vignette but haven't had much luck.

尝试1个slide2 :

anomalize::decompose_twitter函数带有两个参数,datatarget

The anomalize::decompose_twitter function takes two arguments, data and target

tidyverse_cran_downloads %>%
  mutate(
    Monthly_MA = slide2_dfr(
      .x = .,
      .y = count,
      ~ anomalize::decompose_twitter,
      .size = 5
    )
  )

Error: Element 1 has length 3, not 1 or 425. Call rlang :: last_error()to see a backtrace

Error: Element 1 has length 3, not 1 or 425. Callrlang::last_error()to see a backtrace

也许我误解了.x .y语法的工作原理?

Maybe I've misunderstood how the .x .y syntax works?

尝试2:pmap

my_diag <- function(...) {
  data <- tibble(...)
  fit <- anomalize::decompose_twitter(data = data, target = count)
}

tidyverse_cran_downloads %>%
  nest(-package) %>%
  filter(package %in% c("tidyr", "lubridate")) %>%  # just to make it quick
  mutate(diag = purrr::map(data, ~ pslide_dfr(., my_diag, .size = 7)))

Error in stats::stl(., s.window = "periodic", robust = TRUE) : series is not periodic or has less than two periods

似乎正在运行,但是观察之间的时间间隔不知如何被解析?

Appears something is running, but the period between observations is off somehow or not getting parsed?

尝试3:ad_ts

ad_ts仅接受一个参数,因此忽略了我们尚未找到一种方法来分解后计算余数的事实,我应该能够通过slide使用它.它还希望它是x:

ad_ts only takes one argument, so ignoring the fact that we have yet to find a way to calculate the remainder after decomposition, I should be able to use it via slide. It also expects it's x to be:

作为两列数据帧的时间序列,其中第一列由时间戳组成,第二列由观测组成.

Time series as a two column data frame where the first column consists of the timestamps and the second column consists of the observations.

因此,在嵌套数据之后,我们不必对数据做很多事情.

So we shouldn't have to do much to the data after it's nested.

tidyverse_cran_downloads %>%
  nest(-package, .key = "my_data") %>%
  mutate(
    Daily_MA = slide_dfr(
      .f = AnomalyDetection::ad_ts,
      .x = my_data
    )
  )

Error in .f(.x[[i]], ...) : data must be a single data frame.

那么至少要调用该函数,但是要被多个数据帧调用?

So the function is at least being called, but it's being called by more than a single data frame?

我要:

  • 通过twitter算法应用分解过程,然后对其余部分进行异常检测
  • 使用两个异常检测程序包之一或两者结合
  • 将其应用于时间窗口
  • 在分类的分类数据上

我的数据集不同的唯一方法是,我在多个月的期间内半小时观察一次值,而实际上我只需要每天重新计算一次异常(即每48次观察一次),窗口会在过去的 30天中进行回顾,以进行分解和检测.

The only way my data set differs is that I have half hourly observations of values over a period of multiple months, and I actually only need the anomalies recalculated each day (i.e. once every 48 observations), where the window looks back over the prior 30 days to decompose and detect them.

(注:我本来会标记tsibbleanomalize,但我没有代表来制作这些标记)

(N.B. I would have tagged tsibble and anomalize, but I don't have the rep to make those tags)

推荐答案

方法2应该能按预期工作?该错误消息与stl()有关,它至少需要两个季节来进行估计.例如,每日数据至少需要14个观测值才能运行stl().增大窗口大小.size = 7 * 3效果很好.

Approach 2 should work as expected? The error message is related to the stl() that requires at least two seasonal periods to estimate. For example, daily data needs at least 14 observations for stl() to run. Increasing the window size .size = 7 * 3 works fine.

my_decomp <- function(...) {
  data <- tibble(...)
  anomalize::decompose_twitter(data, count)
}

library(dplyr)
library(anomalize)
tidyverse_cran_downloads %>%
  group_by(package) %>% 
  tidyr::nest() %>% 
  mutate(diag = purrr::map(data, ~ tsibble::pslide_dfr(., my_decomp, .size = 7 * 3)))
#> # A tibble: 15 x 3
#>    package   data               diag                
#>    <chr>     <list>             <list>              
#>  1 tidyr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  2 lubridate <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  3 dplyr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  4 broom     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  5 tidyquant <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  6 tidytext  <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  7 ggplot2   <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  8 purrr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#>  9 glue      <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 10 stringr   <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 11 forcats   <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 12 knitr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 13 readr     <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 14 tibble    <tibble [425 × 2]> <tibble [8,506 × 5]>
#> 15 tidyverse <tibble [425 × 2]> <tibble [8,506 × 5]>

这篇关于在滑动/平铺窗口上应用时间序列分解(和异常检测)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆