如何生成嵌套在多列中的数据滞后? [英] How can I generate lags of data nested in multiple columns?
问题描述
我的目标是计算随时间嵌套在空间单位中的年龄组的增长率.我正在使用的数据帧的结构如下(但更大):
My goal is to calculate growth rates for age groups that are nested in spatial units over time. The data frame I am working with is structured like this (but much larger):
set.seed(1234)
df <- data.frame(Time = c(1,1,1,1,2,2,2,2,3,3,3,3),
Region = rep(c("A", "A", "B", "B"),3),
Age = rep(c(1,2), 6),
No_Persons = round(rnorm(12, mean = 10),0))
在 Region
中的每个类别中,我需要将 No_Persons
中的变化从一年更改为另一年( Time
),并且在之间进行更改>年龄
组1和2.因此基本上,该操作将是 Age2_Time2
/ Age1_Time1
.我尝试使用各种 lag
-functions和 data.table :: shift
来实现自己的目标,但是无法正常工作.例如,虽然这会给我想要的结果,但它只会吐出 NAs
:
Within each category in Region
, I need to get the change in No_Persons
from one year to another (Time
) and between Age
groups 1 and 2. So basically the operation would be Age2_Time2
/ Age1_Time1
. I tried achieving my goal using various lag
-functions as well as data.table::shift
but couldn't get it to work. For example, I though this would give me the desired results, but it only spits out NAs
:
library(tidyverse)
df %>%
group_by(Region) %>%
mutate(Ratio = No_Persons / dplyr::lag(No_Persons,
n = 1,
order_by = "Age"))
我通过使用 pivot_wider
获得正确的结果,然后通过使用列手动计算增长率,如下所示:
I get the right results by using pivot_wider
and then manually calculating growth rates by working with the columns, like this:
df %>%
pivot_wider(names_from = "Age", values_from = "No_Persons") %>%
group_by(Region) %>%
mutate(Ratio = `2` / dplyr::lag(`1`, order_by = Time))
# A tibble: 6 x 5
# Groups: Region [2]
Time Region `1` `2` Quote
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 A 9 10 NA
2 1 B 11 8 NA
3 2 A 10 11 1.22
4 2 B 9 9 0.818
5 3 A 9 9 0.9
6 3 B 10 9 1
但是,由于原始数据集具有更多的年龄组,因此这变得乏味并且容易出错.我更喜欢编程解决方案.
However, since the original data set has many more age groups, this becomes tedious and prone to error. I'd much prefer a programmatic solution.
推荐答案
更新后的答案
根据您的评论,我重构了具有3个时间点,2个地区和3个年龄段的最小数据集 df
.
set.seed(1234)
time.number = 3
region.number = 2
age.number = 3
total.number = time.number * region.number * age.number
df <-
data.frame(
Time = rep(1:time.number, each = region.number * age.number),
Region = rep(LETTERS[1:region.number], each = age.number),
Age = rep(seq(1, age.number), region.number),
No_Persons = round(rnorm(total.number, mean = 10), 0)
)
df
以下解决方案也应应用于您的真实数据.
The following solution should also applied to your real data.
library(data.table)
library(magrittr)
# set df as data.table
setDT(df)
# calculate the number from real data
age.number <- df[,Age] %>% unique() %>% length()
region.number <- df[,Region] %>% unique() %>% length()
df[,.(V1=.SD[1:age.number-1,No_Persons],
V2=.SD[2:age.number,No_Persons]),
by = .(Time,Region)][,Radio:=V2/lag(V1,region.number)][]
结果:
Time Region V1 V2 Radio
1: 1 A 9 10 NA
2: 1 A 10 11 NA
3: 1 B 8 10 1.111111
4: 1 B 10 11 1.100000
5: 2 A 9 9 1.125000
6: 2 A 9 9 0.900000
7: 2 B 9 10 1.111111
8: 2 B 10 9 1.000000
9: 3 A 9 10 1.111111
10: 3 A 10 11 1.100000
11: 3 B 10 9 1.000000
12: 3 B 9 9 0.900000
上一个答案
我不确定这是否是您想要的结果,但它确实可以获得正确的结果.
Previous Answer
I'm not sure if this is the result you want, but it can really get the right results.
library(data.table)
setDT(df)[,.(V1 = No_Persons[seq(1,.N,2)],
V2 = No_Persons[seq(2,.N,2)]
),
by = .(Time,Region)][,Radio:=V2/lag(V1,2)]
这篇关于如何生成嵌套在多列中的数据滞后?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!