dplyr: mutate_at + coalesce: 列的动态名称 [英] dplyr: mutate_at + coalesce: dynamic names of columns

查看:21
本文介绍了dplyr: mutate_at + coalesce: 列的动态名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试将 mutate_atcoalesce 结合起来,以防动态生成列名.

I've been trying for awhile to combine mutate_at with coalesce in case in which names of columns are generated dynamically.

在我的示例中只有五列,但在实际数据中还有更多(并非所有列都应包含在 coalesce 步骤中).

In my example there are only five columns, but in the real data there are much more (and not all columns should be included in coalesce step).

示例 DF:

data_example <- data.frame(
  aa = c(1, NA, NA),
  bb = c(NA, NA, 2),
  cc = c(6, 7, 8),
  aa_extra = c(2, 2, NA),
  bb_extra = c(1, 2, 3)
)

预期输出:

  aa bb cc aa_extra bb_extra
1  1  1  6        2        1
2  2  2  7        2        2
3 NA  2  8       NA        3

输出为结构:

structure(list(aa = c(1, 2, NA), bb = c(1, 2, 2), cc = c(6, 7, 
8), aa_extra = c(2, 2, NA), bb_extra = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L))

我尝试过类似的方法,但没有成功(只有字符串可以转换为符号").我想避免创建额外的变量,只需在 mutate_at 表达式中包含所有内容,因为这是更长的 dplyr flow"的一部分.

I've tried something like this, but without success ("Only strings can be converted to symbols"). I would like to avoid creation of extra variables, just include everything in mutate_at expression, since this is a part of longer dplyr "flow".

data_example %>%
  dplyr::mutate_at(
    gsub("_extra", "", grep("_extra$",
                            colnames(.),
                            perl = T,
                            value = T)),
    dplyr::funs(
      dplyr::coalesce(., !!! dplyr::sym(paste0(., "_extra")))
    )
  )

我也试过这个(没有错误,但列 bb 的值是错误的):

I've tried also this (no error, but values for column bb are wrong):

data_example %>%
  dplyr::mutate_at(
    gsub("_extra", "", grep("_extra$",
                            colnames(.),
                            perl = T,
                            value = T)),
    dplyr::funs(
      dplyr::coalesce(., !!as.name(paste0(names(.), "_extra")))
    )
  )

如何获取已处理列的名称并将其传递给coalesce?

How to get the name of processed column and pass it to coalesce?

推荐答案

我们可以在去掉列名的子串后,将数据集split分成一个list的data.frames("_extra"),然后用 map 循环遍历 listcoalesce 列,然后 与原始数据集中的_extra"列绑定

We can split the dataset into a list of data.frames after removing the substring of column names ("_extra"), then with map loop through the list, coalesce the column and then bindwith the "_extra" columns in the original dataset

library(tidyverse)
data_example %>% 
   split.default(str_remove(names(.), "_extra")) %>%
   map_df(~ coalesce(!!! .x)) %>%
   #or use
   # map_df(reduce, coalesce) %>%
   bind_cols(., select(data_example, ends_with("extra")))
# A tibble: 3 x 5
#     aa    bb    cc aa_extra bb_extra
#  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#1     1     1     6        2        1
#2     2     2     7        2        2
#3    NA     2     8       NA        3

这篇关于dplyr: mutate_at + coalesce: 列的动态名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆