dplyr通过评估查找单元格值来突变特定的列 [英] dplyr mutate specific columns by evaluating lookup cell value
问题描述
我已经使用等价物,符号和评估法探索了各种选项,但似乎无法获得正确的语法。这是一个示例数据框。
I have explored various options using quosures, symbols, and evaluation, but I can't seem to get the right syntax. Here is an example dataframe.
data.frame("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
A B C D pastecols
1 a z a b B, C
2 b y c d B, D
3 c x e f B, C, D
4 d w g h <NA>
现在假设我想基于pastecols中的查找字符串粘贴来自不同列的值,而我总是想要包括A列。这是我想要的结果:
Now suppose I want to paste values from different columns based on the lookup string in pastecols, and I always want to include column A. This is my desired result:
A B C D pastecols result
1 a z a b B, C a z a
2 b y c d B, D b y d
3 c x e f B, C, D c x e f
4 d w g h <NA> d
理想情况下,这可以在dplyr中完成。这是我得到的最接近的值:
Ideally this could be done in dplyr. This is the closest I have gotten:
x %>% mutate(result = lapply(lapply(str_split(pastecols, ", "), c, "A"), na.omit))
A B C D pastecols result
1 a z a b B, C B, C, A
2 b y c d B, D B, D, A
3 c x e f B, C, D B, C, D, A
4 d w g h <NA> A
推荐答案
这是使用的一种方法pmap
做类似的事情。 pmap
可用于通过捕获每一行作为命名矢量来有效地逐行处理数据帧;然后,您可以使用 [ pastecols]
选择要获取的列名称,以使其作为 cols
进行索引。
Here's one way using pmap
to do a similar thing. pmap
can be used to effectively work on dataframes by row by capturing each row as a named vector; you can then get the desired column names for indexing as cols
by selecting them with ["pastecols"]
.
大多数匿名函数语法不是 tidyverse
东西,而是基本的R东西。要遍历它:
Most of the anonymous function syntax is not tidyverse
stuff, but just basic R stuff. To walk through it:
- 将数据框作为列表传递到
.l
pmap_chr
的参数。请记住,数据框是列的列表! - 使用
c(。捕获所有
。基本上,我们将数据框的每一行称为函数的参数;现在...
参数。 。)row
是包含行的命名向量。请注意,如果您有列表列,则该列会中断,(但是这里还有很多其他事情,所以我认为这里没有...) - 我们可以获取值我们想要从
row [ pastecols]
中获得的row
,但是我们需要转(说)B,C
放入c( A, B, C)
即可。下一行仅添加A
,将丢失的值替换为A
,如果有则拆分为几部分任何,然后再向下索引到列表中。[[
部分就是在管道链中执行list [[1]]
的方式,这是前缀之所以需要这种形式,是因为str_split
返回一个列表,而我们只需要向量。 - 使用此
cols
向量从row
行中获取所需的值并将其返回,折叠成长度为1个字符的向量!
- Pass the dataframe as the list to the
.l
argument ofpmap_chr
. Remember that dataframes are lists of columns! - Capture all the
...
arguments withc(...)
. Basically we are calling each row of the dataframe as arguments to the function; nowrow
is a named vector containing the row. Note that if you have list-columns this will break, (but so will a lot of other things here so I assume there aren't any...) - We can get the values of
row
that we want fromrow["pastecols"]
, but we need to turn (say)"B, C"
intoc("A", "B", "C")
to do that. This next line just adds the"A"
, replaces missing values with"A"
, splits into pieces if there are any, and then indexes back down into the list. The[[
part is just how you dolist[[1]]"
in a pipe chain, it's the prefix form of the operator. You need this becausestr_split
returns a list and we just want the vector. - Use this
cols
vector to get the desired values fromrow
and return it, collapsed into a length 1 character vector!
library(tidyverse)
tbl <- tibble("A" = letters[1:4], "B" = letters[26:23], "C" = letters[c(1,3,5,7)], "D" = letters[c(2,4,6,8)], "pastecols" = c("B, C","B, D", "B, C, D", NA))
tbl %>%
mutate(result = pmap_chr(
.l = .,
.f = function(...){
row <- c(...)
cols <- row["pastecols"] %>% str_c("A, ", .) %>% replace_na("A") %>% str_split(", ") %>% `[[`(1)
vals <- row[cols] %>% str_c(collapse = ", ")
return(vals)
}
))
#> # A tibble: 4 x 6
#> A B C D pastecols result
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 a z a b B, C a, z, a
#> 2 b y c d B, D b, y, d
#> 3 c x e f B, C, D c, x, e, f
#> 4 d w g h <NA> d
由 reprex软件包(v0.2.0)。
这篇关于dplyr通过评估查找单元格值来突变特定的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!