如何生成嵌套在多列中的数据滞后? [英] How can I generate lags of data nested in multiple columns?

查看:51
本文介绍了如何生成嵌套在多列中的数据滞后?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是计算随时间嵌套在空间单位中的年龄组的增长率.我正在使用的数据帧的结构如下(但更大):

My goal is to calculate growth rates for age groups that are nested in spatial units over time. The data frame I am working with is structured like this (but much larger):

set.seed(1234)

df <- data.frame(Time = c(1,1,1,1,2,2,2,2,3,3,3,3),
                 Region = rep(c("A", "A", "B", "B"),3),
                 Age = rep(c(1,2), 6),
                 No_Persons = round(rnorm(12, mean = 10),0))

Region 中的每个类别中,我需要将 No_Persons 中的变化从一年更改为另一年( Time ),并且在之间进行更改>年龄组1和2.因此基本上,该操作将是 Age2_Time2 / Age1_Time1 .我尝试使用各种 lag -functions和 data.table :: shift 来实现自己的目标,但是无法正常工作.例如,虽然这会给我想要的结果,但它只会吐出 NAs :

Within each category in Region, I need to get the change in No_Persons from one year to another (Time) and between Age groups 1 and 2. So basically the operation would be Age2_Time2 / Age1_Time1. I tried achieving my goal using various lag-functions as well as data.table::shift but couldn't get it to work. For example, I though this would give me the desired results, but it only spits out NAs:

library(tidyverse)

df %>% 
  group_by(Region) %>%
  mutate(Ratio = No_Persons / dplyr::lag(No_Persons,
                                      n = 1,
                                      order_by = "Age"))

我通过使用 pivot_wider 获得正确的结果,然后通过使用列手动计算增长率,如下所示:

I get the right results by using pivot_wider and then manually calculating growth rates by working with the columns, like this:

df %>% 
  pivot_wider(names_from = "Age", values_from = "No_Persons") %>%
  group_by(Region) %>%
  mutate(Ratio = `2` / dplyr::lag(`1`, order_by = Time))

# A tibble: 6 x 5
# Groups:   Region [2]
   Time Region   `1`   `2`  Quote
  <dbl> <chr>  <dbl> <dbl>  <dbl>
1     1 A          9    10 NA    
2     1 B         11     8 NA    
3     2 A         10    11  1.22 
4     2 B          9     9  0.818
5     3 A          9     9  0.9  
6     3 B         10     9  1    

但是,由于原始数据集具有更多的年龄组,因此这变得乏味并且容易出错.我更喜欢编程解决方案.

However, since the original data set has many more age groups, this becomes tedious and prone to error. I'd much prefer a programmatic solution.

推荐答案

更新后的答案

根据您的评论,我重构了具有3个时间点,2个地区和3个年龄段的最小数据集 df .

set.seed(1234)
time.number = 3
region.number = 2
age.number = 3
total.number = time.number * region.number * age.number
df <-
  data.frame(
    Time = rep(1:time.number, each = region.number * age.number),
    Region = rep(LETTERS[1:region.number], each = age.number),
    Age = rep(seq(1, age.number), region.number),
    No_Persons = round(rnorm(total.number, mean = 10), 0)
  )
df

以下解决方案也应应用于您的真实数据.

The following solution should also applied to your real data.

library(data.table)
library(magrittr)
# set df as data.table
setDT(df)

# calculate the number from real data
age.number <- df[,Age] %>% unique() %>% length()
region.number <- df[,Region] %>% unique() %>% length()

df[,.(V1=.SD[1:age.number-1,No_Persons],
      V2=.SD[2:age.number,No_Persons]),
   by = .(Time,Region)][,Radio:=V2/lag(V1,region.number)][]

结果:

   Time Region V1 V2    Radio
 1:    1      A  9 10       NA
 2:    1      A 10 11       NA
 3:    1      B  8 10 1.111111
 4:    1      B 10 11 1.100000
 5:    2      A  9  9 1.125000
 6:    2      A  9  9 0.900000
 7:    2      B  9 10 1.111111
 8:    2      B 10  9 1.000000
 9:    3      A  9 10 1.111111
10:    3      A 10 11 1.100000
11:    3      B 10  9 1.000000
12:    3      B  9  9 0.900000

上一个答案

我不确定这是否是您想要的结果,但它确实可以获得正确的结果.

Previous Answer

I'm not sure if this is the result you want, but it can really get the right results.

library(data.table)
setDT(df)[,.(V1 = No_Persons[seq(1,.N,2)],
             V2 = No_Persons[seq(2,.N,2)]
            ),
          by = .(Time,Region)][,Radio:=V2/lag(V1,2)]

这篇关于如何生成嵌套在多列中的数据滞后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆