按时间序列分类 [英] Disaggregate in the context of a time series

查看:68
本文介绍了按时间序列分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,我想整体可视化并按几个不同的变量进行分类。我创建了一个带有玩具发光应用程序的flexdashboard,以选择分类的类型,并使用工作代码来绘制正确的子集。

I have a dataset that I want to visualize overall and disaggregated by a few different variables. I created a flexdashboard with a toy shiny app to select the type of disaggregation, and working code to plot the correct subset.

我的方法是重复性的,这暗示了对我来说,我没有找到更好的方法来做到这一点。让我大吃一惊的是,需要按日期计数并扩展矩阵。我不确定如何在一个管道中按周获取组计数。我分几个步骤进行操作并合并。

My approach is repetitive, which is a hint to me that I'm missing out on a better way to do this. The piece that's tripping me up is the need to count by date and expand the matrix. I'm not sure how get group counts by week in one pipe. I do it in several steps and combine.

有想法吗?

(请问我在< a href = https://community.rstudio.com/t/disaggregate-in-the-context-of-a-time-series/14007 rel = nofollow noreferrer> RStudio社区,但我认为这可能更像是 SO问题。我没有从RSC删除该问题的权限,因此对交叉发布表示歉意。)

(ps. I asked this question on RStudio Community, but I think it's probably more of a "SO question". I don't have permissions to delete it from RSC, so apologies for the cross-post.)

---
title: "test"
output: 
  flexdashboard::flex_dashboard:
    theme: bootstrap
runtime: shiny
---

```{r setup, include=FALSE}
  library(flexdashboard)
  library(tidyverse)
  library(tibbletime)
  library(dygraphs)
  library(magrittr)
  library(xts)
```

```{r global, include=FALSE}
  set.seed(1)
  dat <- data.frame(date = seq(as.Date("2018-01-01"), 
                               as.Date("2018-06-30"), 
                               "days"),
                    sex = sample(c("male", "female"), 181, replace=TRUE),
                    lang = sample(c("english", "spanish"), 181, replace=TRUE),
                    age = sample(20:35, 181, replace=TRUE))
  dat <- sample_n(dat, 80)
```

Sidebar {.sidebar}
=====================================

```{r}
  radioButtons("diss", label = "Disaggregation",
    choices = list("All" = 1, "By Sex" = 2, "By Language" = 3), 
    selected = 1)
```

Page 1
=====================================

```{r}
# all
  all <- reactive(
  dat %>%  
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>% # convert to tibble time object
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total = 0)) 
  )

# males only
  males <- reactive(
  dat %>%  
    filter(sex=="male") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_m = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_m = 0)) 
  )

# females only
  females <- reactive(
  dat %>%  
    filter(sex=="female") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_f = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_f = 0)) 
  )

# english only
  english <- reactive(
  dat %>%  
    filter(lang=="english") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_e = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_e = 0)) 
  )

# spanish only
  spanish <- reactive(
  dat %>%  
    filter(lang=="spanish") %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    as_tbl_time(index = date) %>%
    select(date, new) %>%
    collapse_by('1 week', side="start", clean=TRUE) %>%
    group_by(date) %>%
    mutate(total_s = sum(new, na.rm=TRUE)) %>% 
    distinct(date, .keep_all = TRUE) %>% 
    ungroup() %>%
  # expand matrix to include weeks without data
    complete(date = seq(date[1],
                        date[length(date)],
                        by = "1 week"),
             fill = list(total_s = 0)) 
  )

# combine

  totals <- reactive({

  all <- all()
  females <- females()
  males <- males()
  english <- english()
  spanish <- spanish()

  all %>%
    select(date, total) %>%
    full_join(select(females, date, total_f), by = "date") %>%
    full_join(select(males, date, total_m), by = "date") %>%
    full_join(select(english, date, total_e), by = "date") %>%
    full_join(select(spanish, date, total_s), by = "date") 
  })

# convert to xts
  totals_ <- reactive({
    totals <- totals()
    xts(totals, order.by = totals$date)
  })

# plot
  renderDygraph({

  totals_ <- totals_()

  if (input$diss == 1) {
  dygraph(totals_[, "total"],
          main= "All") %>%
    dySeries("total", label = "All") %>%
    dyRangeSelector() %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else if (input$diss == 2) {
    dygraph(totals_[, c("total_f", "total_m")],
            main = "By sex") %>%
    dyRangeSelector() %>%
    dySeries("total_f", label = "Female") %>%
    dySeries("total_m", label = "Male") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else {
    dygraph(totals_[, c("total_e", "total_s")],
            main = "By language") %>%
    dyRangeSelector() %>%
    dySeries("total_e", label = "English") %>%
    dySeries("total_s", label = "Spanish") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE)
  }
  })
```

更新:

@Jon Spring建议编写一个减少重复次数的函数(如下),这是一个很好的改进。但是,基本方法是相同的。分割,计算,合并,绘图。有没有办法做到这一点而又不会分裂并重新组合在一起?

@Jon Spring suggested writing a function to reduce some repetition (applied below), which is a nice improvement. The basic approach is the same, however. Segment, calculate, combine, plot. Is there a way to do this without breaking apart and putting back together?

---
title: "test"
output: 
  flexdashboard::flex_dashboard:
    theme: bootstrap
runtime: shiny
---

```{r setup, include=FALSE}
  library(flexdashboard)
  library(tidyverse)
  library(tibbletime)
  library(dygraphs)
  library(magrittr)
  library(xts)
```

```{r global, include=FALSE}
# generate data
  set.seed(1)
  dat <- data.frame(date = seq(as.Date("2018-01-01"), 
                               as.Date("2018-06-30"), 
                               "days"),
                    sex = sample(c("male", "female"), 181, replace=TRUE),
                    lang = sample(c("english", "spanish"), 181, replace=TRUE),
                    age = sample(20:35, 181, replace=TRUE))
  dat <- sample_n(dat, 80)

# Jon Spring's function
  prep_dat <- function(filtered_dat, col_name = "total") {
  filtered_dat %>%
    mutate(new = 1) %>%
    arrange(date) %>%
  # time series analysis
    tibbletime::as_tbl_time(index = date) %>% # convert to tibble time object
    select(date, new) %>%
    tibbletime::collapse_by("1 week", side = "start", clean = TRUE) %>%
    group_by(date) %>%
    mutate(total = sum(new, na.rm = TRUE)) %>%
    distinct(date, .keep_all = TRUE) %>%
    ungroup() %>%
    # expand matrix to include weeks without data
    complete(
      date = seq(date[1], date[length(date)], by = "1 week"),
      fill = list(total = 0)
    )
  }
```

Sidebar {.sidebar}
=====================================

```{r}
  radioButtons("diss", label = "Disaggregation",
    choices = list("All" = 1, "By Sex" = 2, "By Language" = 3), 
    selected = 1)
```

Page 1
=====================================

```{r}
# all
  all <- reactive(
  prep_dat(dat) 
  )

# males only
  males <- reactive(
  prep_dat(
    dat %>% 
    filter(sex == "male")
  ) %>% 
    rename("total_m" = "total")
  )

# females only
  females <- reactive(
  prep_dat(
    dat %>% 
    filter(sex == "female")
  ) %>% 
    rename("total_f" = "total")
  )

# english only
  english <- reactive(
  prep_dat(
    dat %>% 
    filter(lang == "english")
  ) %>% 
    rename("total_e" = "total")
  )

# spanish only
  spanish <- reactive(
  prep_dat(
    dat %>% 
    filter(lang == "spanish")
  ) %>% 
    rename("total_s" = "total")
  )

# combine

  totals <- reactive({

  all <- all()
  females <- females()
  males <- males()
  english <- english()
  spanish <- spanish()

  all %>%
    select(date, total) %>%
    full_join(select(females, date, total_f), by = "date") %>%
    full_join(select(males, date, total_m), by = "date") %>%
    full_join(select(english, date, total_e), by = "date") %>%
    full_join(select(spanish, date, total_s), by = "date") 
  })

# convert to xts
  totals_ <- reactive({
    totals <- totals()
    xts(totals, order.by = totals$date)
  })

# plot
  renderDygraph({

  totals_ <- totals_()

  if (input$diss == 1) {
  dygraph(totals_[, "total"],
          main= "All") %>%
    dySeries("total", label = "All") %>%
    dyRangeSelector() %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else if (input$diss == 2) {
    dygraph(totals_[, c("total_f", "total_m")],
            main = "By sex") %>%
    dyRangeSelector() %>%
    dySeries("total_f", label = "Female") %>%
    dySeries("total_m", label = "Male") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE) 
  } else {
    dygraph(totals_[, c("total_e", "total_s")],
            main = "By language") %>%
    dyRangeSelector() %>%
    dySeries("total_e", label = "English") %>%
    dySeries("total_s", label = "Spanish") %>%
    dyOptions(useDataTimezone = FALSE,
              stepPlot = TRUE,
              drawGrid = FALSE,
              fillGraph = TRUE)
  }
  })
```


推荐答案

感谢您详细说明您的目标。我认为@ simon-s-a建议的方法将简化事情。如果我们可以动态地运行分组并进行结构化,以便我们不必事先知道这些组中的可能组件,那么维护起来会容易得多。

Thanks for explaining more about your goals. I think the approach @simon-s-a suggests will simplify things. If we can run the grouping dynamically, and structure it so that we don't need to know the possible components in those groups beforehand, it will be a lot easier to maintain.

这是最小的可行产品,可重新构建绘图功能以在其中包含分组逻辑。

Here's a minimum viable product that rebuilds the plotting function to include the grouping logic inside it.


  1. 按日期分组后,无论我们的分组变量是多少,它都会计算每个组有多少行,然后将它们分散开来组获得一列。

  1. Once grouped by date and whatever our grouping variable is, it counts how many rows each group has, then spreads those so each group gets a column.

然后我使用 padr :: pad 填充介于两者之间的所有缺少的时间行,并替换所有NA都为零。

Then I use padr::pad to pad out any missing time rows in between, and replace all the NA's with zeros.

最后,该数据帧被转换为 xts 对象并输入到dygraph中,自动处理多列。

Finally, that data frame is converted to an xts object and fed into dygraph, which seems to handle the multiple columns automatically.

此处:

---
title: "test"
output: 
  flexdashboard::flex_dashboard:
    theme: bootstrap
runtime: shiny
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(tibbletime)
library(dygraphs)
library(magrittr)
library(xts)
```

```{r global, include=FALSE}
# generate data
set.seed(1)
dat <- data.frame(date = seq(as.Date("2018-01-01"), 
                             as.Date("2018-06-30"), 
                             "days"),
                  sex = sample(c("male", "female"), 181, replace=TRUE),
                  lang = sample(c("english", "spanish"), 181, replace=TRUE),
                  age = sample(20:35, 181, replace=TRUE))
dat <- dplyr::sample_n(dat, 80)
```

Sidebar {.sidebar}
=====================================

```{r}

radioButtons("diss", label = "Disaggregation",
             choices = list("All" = "Total",
                            "By Sex" = "sex",
                            "By Language" = "lang"), 
             selected = "Total")
```

Page 1
=====================================

```{r plot}

renderDygraph({
  grp_col <- rlang::sym(input$diss) # This converts the input selection to a symbol

  dat %>%
    mutate(Total = 1) %>% # This is a hack to let us "group" by Total -- all one group

    # Here's where we unquote the symbol so that dplyr can use it 
    #   to refer to a column. In this case I make a dummy column 
    #   that's a copy of whatever column we want to group
    mutate(my_group = !!grp_col) %>%

    # Now we make a group for every existing combination of week 
    #   (using lubridate::floor_date) and level of our grouping column,
    #   count how many rows in each group, and spread that to wide format.
    group_by(date = lubridate::floor_date(date, "1 week"), my_group) %>%
    count() %>% spread(my_group, n) %>% ungroup() %>%

    # padr:pad() fills in any missing weeks in the sequence with new rows
    #   Then we replace all the NA's with zeroes.
    padr::pad() %>% replace(is.na(.), 0) %>%

    # Finally we can convert to xts and feed the wide table into digraph.
    xts::xts(order.by = .$date) %>%
    dygraph() %>%
    dyRangeSelector() %>%
    dyOptions(
      useDataTimezone = FALSE, stepPlot = TRUE,
      drawGrid = FALSE, fillGraph = TRUE
    )
})
```

这篇关于按时间序列分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆