过滤列,并使用R重复比较成对的两列 [英] Filter columns and repeatedly comparing two columns in pair using R

查看:27
本文介绍了过滤列,并使用R重复比较成对的两列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定如下df,对于每一年的实际值和预测值,我需要检查当年的实际值和预测值与上一年的实际值是否方向一致:

df <- structure(list(code = c("M0000273", "M0000357", "M0000545"), 
    name = c("industry", "agriculture", "service"), `2019_actual` = c(16.78, 
    9.26, 49.38), `2019_pred` = c(17.78, 10.26, NA), `2020_actual` = c(35.74, 
    NA, 49.38), `2020_pred` = c(36.74, 66.56, 25.36), `2021_actual` = c(30.74, 
    83.42, 63.26), `2021_pred` = c(31.74, 84.42, 35.23)), class = "data.frame", row.names = c(NA, 
-3L))

输出:

      code        name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred
1 M0000273    industry       16.78     17.78       35.74     36.74       30.74     31.74
2 M0000357 agriculture        9.26     10.26          NA     66.56       83.42     84.42
3 M0000545     service       49.38        NA       49.38     25.36       63.26     35.23

逻辑是:如果两年的差值为正、负或零,则分别返回increasedecreaseunchanged,如果其中一个或两个值都为NA,则返回NA

预期结果:

     code        name 2019_actual 2019_pred 2020_actual 2020_pred 2021_actual 2021_pred 2020_act_direction 2020_pred_direction 2021_act_direction
1 M0000273    industry       16.78     17.78       35.74     36.74       30.74     31.74           increase            increase           decrease
2 M0000357 agriculture        9.26     10.26          NA     66.56       83.42     84.42                               increase                   
3 M0000545     service       49.38        NA       49.38     25.36       63.26     35.23          unchanged            decrease           increase
  2021_pred_direction
1            decrease
2                    
3            decrease

我尝试使用以下代码,但有两个问题:1.返回错误:**Error: unexpected ')' in " )"**;2、如果我有很多年(即从2010年到2020年),显然不是获得预期结果的正确方式。

df %>% 
  mutate(
    `2020_act_direction` = case_when(`2020_actual` - `2019_actual` > 0 ~ 'increase',
                                   `2020_actual` - `2019_actual` < 0 ~ 'decrease',
                                   `2020_actual` - `2019_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
                                   ),
    `2020_pred_direction` = case_when(`2020_pred` - `2019_actual` > 0 ~ 'increase',
                                   `2020_pred` - `2019_actual` < 0 ~ 'decrease',
                                   `2020_pred` - `2019_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
    `2021_act_direction` = case_when(`2021_actual` - `2020_actual` > 0 ~ 'increase',
                                   `2021_actual` - `2020_actual` < 0 ~ 'decrease',
                                   `2021_actual` - `2020_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
    `2021_pred_direction` = case_when(`2021_pred` - `2020_actual` > 0 ~ 'increase',
                                   `2021_pred` - `2020_actual` < 0 ~ 'decrease',
                                   `2021_pred` - `2020_actual` == 0 ~ 'unchanged',
                                   TRUE ~ NA
    )
  )

如何处理此问题?

推荐答案

使用pivot_longerpivot_wider每年获取一行/代码/名称。然后您可以轻松地使用lag在连续年份之间进行比较。

library(tidyverse)

df <- structure(list(code = c("M0000273", "M0000357", "M0000545"), 
                     name = c("industry", "agriculture", "service"), 
                     `2019_actual` = c(16.78, 9.26, 49.38), 
                     `2019_pred` = c(17.78, 10.26, NA), 
                     `2020_actual` = c(35.74, NA, 49.38), 
                     `2020_pred` = c(36.74, 66.56, 25.36), 
                     `2021_actual` = c(30.74, 83.42, 63.26), 
                     `2021_pred` = c(31.74, 84.42, 35.23)), 
                class = "data.frame", row.names = c(NA, -3L)) %>% 
    as_tibble()

df %>% 
    pivot_longer(cols = c(-code, -name), names_to = c("year", "type"), names_sep = "_") %>% 
    pivot_wider(names_from = "type", values_from = "value") %>% 
    mutate(year = as.integer(year)) %>% 
    group_by(code, name) %>% 
    arrange(year) %>% 
    mutate(act_direction = case_when(actual > lag(actual) ~ "increase",
                                     actual < lag(actual) ~ "decrease",
                                     actual == lag(actual) ~ "unchanged"),
           pred_direction = case_when(pred > lag(actual) ~ "increase",
                                   pred < lag(actual) ~ "decrease",
                                   pred == lag(actual) ~ "unchanged"))

这篇关于过滤列,并使用R重复比较成对的两列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆