如何使用带有 dplyr::mutate 的 purrr 映射来基于列对创建多个新列 [英] How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

查看:19
本文介绍了如何使用带有 dplyr::mutate 的 purrr 映射来基于列对创建多个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须使用 R 解决以下问题.简而言之,我想根据数据框中不同列对的计算在数据框中创建多个新列.

I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.

数据如下:

df <- data.frame(a1 = c(1:5), 
                 b1 = c(4:8), 
                 c1 = c(10:14), 
                 a2 = c(9:13), 
                 b2 = c(3:7), 
                 c2 = c(15:19))
df
a1 b1 c1 a2 b2 c2
1  4 10  9  3 15
2  5 11 10  4 16
3  6 12 11  5 17
4  7 13 12  6 18
5  8 14 13  7 19

输出应该如下所示:

a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1  4 10  9  3 15    10     7    25
2  5 11 10  4 16    12     9    27
4  7 13 12  6 18    16    13    31
5  8 14 13  7 19    18    15    33

我可以使用 dplyr 通过以下方式进行一些手动工作来实现这一点:

I can achieve this using dplyr doing some manual work in the following way:

df %>% rowwise %>% mutate(sum_a = sum(a1, a2),
                          sum_b = sum(b1, b2),
                          sum_c = sum(c1, c2)) %>% 
  as.data.frame()

所以正在做的是:取其中包含字母a"的列,按行计算总和,然后创建一个总和名为 sum_[letter] 的新列.对具有不同字母的列重复.

So what is being done is: take columns with the letter "a" in it, calulate the sum rowwise, and create a new column with the sum named sum_[letter]. Repeat for columns with different letters.

这是有效的,但是,如果我有一个包含 300 个不同列对的大型数据集,手动输入将很重要,因为我必须编写 300 个 mutate 调用.

This is working, however, if I have a large data set with say 300 different column pairs the manual input would be significant, since I would have to write 300 mutate calls.

我最近偶然发现了 R 包purrr",我猜这将解决我以更自动化的方式做我想做的事情的问题.

I recently stumbled upon the R package "purrr" and my guess is that this would solve my problem of doing what I want in a more automated way.

特别是,我认为能够使用 purrr:map2 将两个列名列表传递给它.

In particular, I would think to be able to use purrr:map2 to which I pass two lists of column names.

  • list1 = 包含数字 1 的所有列
  • list2 = 包含数字 2 的所有列

然后我可以计算每个匹配列表条目的总和,形式为:

Then I could calculate the sum of each matching list entry, in the form of:

map2(list1, list2, ~mutate(sum))

但是,我无法弄清楚如何使用 purrr 最好地解决这个问题.我对使用 purrr 还很陌生,因此我非常感谢您对这个问题的任何帮助.

However, I am not able to figure out how to best approach this problem using purrr. I am rather new to using purrr, so I would really appreciate any help on this issue.

推荐答案

这是带有 purrr 的一个选项.我们得到数据集namesunique前缀('nm1'),使用map(来自purrr) 循环遍历唯一名称,选择 匹配 'nm1' 前缀值的列,使用 reduce 和绑定添加行带有原始数据集的列 (bind_cols)

Here is one option with purrr. We get the unique prefix of the names of the dataset ('nm1'), use map (from purrr) to loop through the unique names, select the column that matches the prefix value of 'nm1', add the rows using reduce and the bind the columns (bind_cols) with the original dataset

library(tidyverse)
nm1 <- names(df) %>% 
          substr(1, 1) %>%
          unique 
nm1 %>% 
     map(~ df %>% 
            select(matches(.x)) %>%
            reduce(`+`)) %>%
            set_names(paste0("sum_", nm1)) %>%
     bind_cols(df, .)
#    a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1  1  4 10  9  3 15    10     7    25
#2  2  5 11 10  4 16    12     9    27
#3  3  6 12 11  5 17    14    11    29
#4  4  7 13 12  6 18    16    13    31
#5  5  8 14 13  7 19    18    15    33

这篇关于如何使用带有 dplyr::mutate 的 purrr 映射来基于列对创建多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆