如何将Purrr中的map与dplyr :: mutate结合使用以基于列对创建多个新列 [英] How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs
问题描述
我必须使用R来解决以下问题.总之,我想基于数据帧中不同列对的计算来在数据帧中创建多个新列.
I have to following issue using R. In short I want to create multiple new columns in a data frame based on calculations of different column pairs in the data frame.
数据如下:
df <- data.frame(a1 = c(1:5),
b1 = c(4:8),
c1 = c(10:14),
a2 = c(9:13),
b2 = c(3:7),
c2 = c(15:19))
df
a1 b1 c1 a2 b2 c2
1 4 10 9 3 15
2 5 11 10 4 16
3 6 12 11 5 17
4 7 13 12 6 18
5 8 14 13 7 19
输出应该如下所示:
a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
1 4 10 9 3 15 10 7 25
2 5 11 10 4 16 12 9 27
4 7 13 12 6 18 16 13 31
5 8 14 13 7 19 18 15 33
我可以使用dplyr通过以下方式完成一些手动工作:
I can achieve this using dplyr doing some manual work in the following way:
df %>% rowwise %>% mutate(sum_a = sum(a1, a2),
sum_b = sum(b1, b2),
sum_c = sum(c1, c2)) %>%
as.data.frame()
所以要做的是:取其中包含字母"a"的列,按行计算总和,并创建一个名为sum_ [letter]的总和新列.重复使用不同字母的列.
So what is being done is: take columns with the letter "a" in it, calulate the sum rowwise, and create a new column with the sum named sum_[letter]. Repeat for columns with different letters.
这是可行的,但是,如果我有一个包含300个不同列对的大型数据集,那么手动输入将非常重要,因为我必须编写300个mutate调用.
This is working, however, if I have a large data set with say 300 different column pairs the manual input would be significant, since I would have to write 300 mutate calls.
我最近偶然发现R包"purrr",我的猜测是这将解决我以更自动化的方式完成我想做的事情的问题.
I recently stumbled upon the R package "purrr" and my guess is that this would solve my problem of doing what I want in a more automated way.
尤其是,我认为能够使用传递两个列名列表的purrr:map2.
In particular, I would think to be able to use purrr:map2 to which I pass two lists of column names.
- list1 =其中所有数字为1的所有列
- list2 =其中所有数字为2的所有列
然后我可以以以下形式计算每个匹配列表条目的总和:
Then I could calculate the sum of each matching list entry, in the form of:
map2(list1, list2, ~mutate(sum))
但是,我不知道如何使用purrr最好地解决这个问题.我对使用purrr很陌生,因此,我非常感谢在此问题上提供的任何帮助.
However, I am not able to figure out how to best approach this problem using purrr. I am rather new to using purrr, so I would really appreciate any help on this issue.
推荐答案
这是purrr
的一个选项.我们获得数据集names
的unique
前缀('nm1'),使用map
(来自purrr
)遍历唯一名称,select
该列是matches
前缀值'nm1',使用reduce
添加行,并将列(bind_cols
)与原始数据集
Here is one option with purrr
. We get the unique
prefix of the names
of the dataset ('nm1'), use map
(from purrr
) to loop through the unique names, select
the column that matches
the prefix value of 'nm1', add the rows using reduce
and the bind the columns (bind_cols
) with the original dataset
library(tidyverse)
nm1 <- names(df) %>%
substr(1, 1) %>%
unique
nm1 %>%
map(~ df %>%
select(matches(.x)) %>%
reduce(`+`)) %>%
set_names(paste0("sum_", nm1)) %>%
bind_cols(df, .)
# a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1 1 4 10 9 3 15 10 7 25
#2 2 5 11 10 4 16 12 9 27
#3 3 6 12 11 5 17 14 11 29
#4 4 7 13 12 6 18 16 13 31
#5 5 8 14 13 7 19 18 15 33
这篇关于如何将Purrr中的map与dplyr :: mutate结合使用以基于列对创建多个新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!