如何将Purrr中的map与dplyr :: mutate结合使用以基于列对创建多个新列 [英] How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

查看:126
本文介绍了如何将Purrr中的map与dplyr :: mutate结合使用以基于列对创建多个新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须使用R来解决以下问题.总之,我想基于数据帧中不同列对的计算来在数据帧中创建多个新列.

I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.

数据如下:

df <- data.frame(a1 = c(1:5), 
                 b1 = c(4:8), 
                 c1 = c(10:14), 
                 a2 = c(9:13), 
                 b2 = c(3:7), 
                 c2 = c(15:19))
df
a1 b1 c1 a2 b2 c2
1  4 10  9  3 15
2  5 11 10  4 16
3  6 12 11  5 17
4  7 13 12  6 18
5  8 14 13  7 19

输出应该如下所示:

a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1  4 10  9  3 15    10     7    25
2  5 11 10  4 16    12     9    27
4  7 13 12  6 18    16    13    31
5  8 14 13  7 19    18    15    33

我可以使用dplyr通过以下方式完成一些手动工作:

I can achieve this using dplyr doing some manual work in the following way:

df %>% rowwise %>% mutate(sum_a = sum(a1, a2),
                          sum_b = sum(b1, b2),
                          sum_c = sum(c1, c2)) %>% 
  as.data.frame()

所以要做的是:取其中包含字母"a"的列,按行计算总和,并创建一个名为sum_ [letter]的总和新列.重复使用不同字母的列.

So what is being done is: take columns with the letter "a" in it, calulate the sum rowwise, and create a new column with the sum named sum_[letter]. Repeat for columns with different letters.

这是可行的,但是,如果我有一个包含300个不同列对的大型数据集,那么手动输入将非常重要,因为我必须编写300个mutate调用.

This is working, however, if I have a large data set with say 300 different column pairs the manual input would be significant, since I would have to write 300 mutate calls.

我最近偶然发现R包"purrr",我的猜测是这将解决我以更自动化的方式完成我想做的事情的问题.

I recently stumbled upon the R package "purrr" and my guess is that this would solve my problem of doing what I want in a more automated way.

尤其是,我认为能够使用传递两个列名列表的purrr:map2.

In particular, I would think to be able to use purrr:map2 to which I pass two lists of column names.

  • list1 =其中所有数字为1的所有列
  • list2 =其中所有数字为2的所有列

然后我可以以以下形式计算每个匹配列表条目的总和:

Then I could calculate the sum of each matching list entry, in the form of:

map2(list1, list2, ~mutate(sum))

但是,我不知道如何使用purrr最好地解决这个问题.我对使用purrr很陌生,因此,我非常感谢在此问题上提供的任何帮助.

However, I am not able to figure out how to best approach this problem using purrr. I am rather new to using purrr, so I would really appreciate any help on this issue.

推荐答案

这是purrr的一个选项.我们获得数据集namesunique前缀('nm1'),使用map(来自purrr)遍历唯一名称,select该列是matches前缀值'nm1',使用reduce添加行,并将列(bind_cols)与原始数据集

Here is one option with purrr. We get the unique prefix of the names of the dataset ('nm1'), use map (from purrr) to loop through the unique names, select the column that matches the prefix value of 'nm1', add the rows using reduce and the bind the columns (bind_cols) with the original dataset

library(tidyverse)
nm1 <- names(df) %>% 
          substr(1, 1) %>%
          unique 
nm1 %>% 
     map(~ df %>% 
            select(matches(.x)) %>%
            reduce(`+`)) %>%
            set_names(paste0("sum_", nm1)) %>%
     bind_cols(df, .)
#    a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1  1  4 10  9  3 15    10     7    25
#2  2  5 11 10  4 16    12     9    27
#3  3  6 12 11  5 17    14    11    29
#4  4  7 13 12  6 18    16    13    31
#5  5  8 14 13  7 19    18    15    33

这篇关于如何将Purrr中的map与dplyr :: mutate结合使用以基于列对创建多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆