过滤列,并使用R重复比较成对的两列 [英] Filter columns and repeatedly comparing two columns in pair using R
本文介绍了过滤列,并使用R重复比较成对的两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给定如下df
,对于每一年的实际值和预测值,我需要检查当年的实际值和预测值与上一年的实际值是否方向一致:
df <- structure(list(code = c("M0000273", "M0000357", "M0000545"),
name = c("industry", "agriculture", "service"), `2019_actual` = c(16.78,
9.26, 49.38), `2019_pred` = c(17.78, 10.26, NA), `2020_actual` = c(35.74,
NA, 49.38), `2020_pred` = c(36.74, 66.56, 25.36), `2021_actual` = c(30.74,
83.42, 63.26), `2021_pred` = c(31.74, 84.42, 35.23)), class = "data.frame", row.names = c(NA,
-3L))
输出:
code name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred
1 M0000273 industry 16.78 17.78 35.74 36.74 30.74 31.74
2 M0000357 agriculture 9.26 10.26 NA 66.56 83.42 84.42
3 M0000545 service 49.38 NA 49.38 25.36 63.26 35.23
逻辑是:如果两年的差值为正、负或零,则分别返回increase
、decrease
和unchanged
,如果其中一个或两个值都为NA
,则返回NA
。
预期结果:
code name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred 2020_act_direction 2020_pred_direction 2021_act_direction
1 M0000273 industry 16.78 17.78 35.74 36.74 30.74 31.74 increase increase decrease
2 M0000357 agriculture 9.26 10.26 NA 66.56 83.42 84.42 increase
3 M0000545 service 49.38 NA 49.38 25.36 63.26 35.23 unchanged decrease increase
2021_pred_direction
1 decrease
2
3 decrease
我尝试使用以下代码,但有两个问题:1.返回错误:**Error: unexpected ')' in " )"**
;2、如果我有很多年(即从2010年到2020年),显然不是获得预期结果的正确方式。
df %>%
mutate(
`2020_act_direction` = case_when(`2020_actual` - `2019_actual` > 0 ~ 'increase',
`2020_actual` - `2019_actual` < 0 ~ 'decrease',
`2020_actual` - `2019_actual` == 0 ~ 'unchanged',
TRUE ~ NA
),
`2020_pred_direction` = case_when(`2020_pred` - `2019_actual` > 0 ~ 'increase',
`2020_pred` - `2019_actual` < 0 ~ 'decrease',
`2020_pred` - `2019_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
`2021_act_direction` = case_when(`2021_actual` - `2020_actual` > 0 ~ 'increase',
`2021_actual` - `2020_actual` < 0 ~ 'decrease',
`2021_actual` - `2020_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
`2021_pred_direction` = case_when(`2021_pred` - `2020_actual` > 0 ~ 'increase',
`2021_pred` - `2020_actual` < 0 ~ 'decrease',
`2021_pred` - `2020_actual` == 0 ~ 'unchanged',
TRUE ~ NA
)
)
如何处理此问题?
推荐答案
使用pivot_longer
和pivot_wider
每年获取一行/代码/名称。然后您可以轻松地使用lag
在连续年份之间进行比较。
library(tidyverse)
df <- structure(list(code = c("M0000273", "M0000357", "M0000545"),
name = c("industry", "agriculture", "service"),
`2019_actual` = c(16.78, 9.26, 49.38),
`2019_pred` = c(17.78, 10.26, NA),
`2020_actual` = c(35.74, NA, 49.38),
`2020_pred` = c(36.74, 66.56, 25.36),
`2021_actual` = c(30.74, 83.42, 63.26),
`2021_pred` = c(31.74, 84.42, 35.23)),
class = "data.frame", row.names = c(NA, -3L)) %>%
as_tibble()
df %>%
pivot_longer(cols = c(-code, -name), names_to = c("year", "type"), names_sep = "_") %>%
pivot_wider(names_from = "type", values_from = "value") %>%
mutate(year = as.integer(year)) %>%
group_by(code, name) %>%
arrange(year) %>%
mutate(act_direction = case_when(actual > lag(actual) ~ "increase",
actual < lag(actual) ~ "decrease",
actual == lag(actual) ~ "unchanged"),
pred_direction = case_when(pred > lag(actual) ~ "increase",
pred < lag(actual) ~ "decrease",
pred == lag(actual) ~ "unchanged"))
这篇关于过滤列,并使用R重复比较成对的两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文