dplyr mutate-如何将一行作为函数参数传递? [英] dplyr mutate - How do I pass one row as a function argument?
问题描述
我正尝试在自己的标题中创建一个新列,该列将收集并格式化在所有其他列中找到的所有单词.如果可能,我想使用dplyr进行此操作.原始数据框:
I'm trying to create a new column in my tibble which collects and formats all words found in all other columns. I would like to do this using dplyr, if possible. Original DataFrame:
df <- read.table(text = " columnA columnB
1 A Z
2 B Y
3 C X
4 D W
5 E V
6 F U " )
作为一个简化的示例,我希望做类似的事情:
As a simplified example, I am hoping to do something like:
df %>%
rowwise() %>%
mutate(newColumn = myFunc(.))
并使输出看起来像这样:
And have the output look like this:
columnA columnB newColumn
1 A Z AZ
2 B Y BY
3 C X CX
4 D W DW
5 E V EV
6 F U FU
当我在代码中尝试此操作时,输出如下:
When I try this in my code, the output looks like:
columnA columnB newColumn
1 A Z ABCDEF
2 B Y ABCDEF
3 C X ABCDEF
4 D W ABCDEF
5 E V ABCDEF
6 F U ABCDEF
myFunc应该以一行作为参数,但是当我尝试使用rowwise()时,我似乎正在将整个小节传递到函数中(我可以通过将打印函数添加到myFunc中来看到这一点).
myFunc should take one row as an argument but when I try using rowwise() I seem to be passing the entire tibble into the function (I can see this from adding a print function into myFunc).
我如何只传递一行并迭代地执行此操作,以便它将函数应用于每一行?可以使用dplyr吗?
How can I pass just one row and do this iteratively so that it applies the function to every row? Can this be done with dplyr?
myFunc已简化.实际功能如下:
myFunc in the example is simplified for the sake of my question. The actual function looks like this:
get_chr_vector <- function(row) {
row <- row[,2:ncol(row)] # I need to skip the first row
words <- str_c(row, collapse = ' ')
words <- str_to_upper(words)
words <- unlist(str_split(words, ' '))
words <- words[words != '']
words <- words[!nchar(words) <= 2]
words <- removeWords(words, stopwords_list) # from the tm library
words <- paste(words, sep = ' ', collapse = ' ')
}
推荐答案
看看?dplyr :: do
和?purrr :: map
将任意函数应用于任意列,并通过多个一元运算符链接结果.例如,
Take a look at ?dplyr::do
and ?purrr::map
, which allow you to apply arbitrary functions to arbitrary columns and to chain the results through multiple unary operators. For example,
df1 <- df %>% rowwise %>% do( X = as_data_frame(.) ) %>% ungroup
# # A tibble: 6 x 1
# X
# * <list>
# 1 <tibble [1 x 2]>
# 2 <tibble [1 x 2]>
# ...
请注意,列 X
现在包含1x2个 data.frame
s(或 tibble
s),其中包含来自原始数据的行.frame
.现在,您可以使用 map
将每一个传递给自定义的 myFunc
.
Notice that column X
now contains 1x2 data.frame
s (or tibble
s) comprised of rows from your original data.frame
. You can now pass each one to your custom myFunc
using map
.
myFunc <- function(Y) {paste0( Y$columnA, Y$columnB )}
df1 %>% mutate( Result = map(X, myFunc) )
# # A tibble: 6 x 2
# X Result
# <list> <list>
# 1 <tibble [1 x 2]> <chr [1]>
# 2 <tibble [1 x 2]> <chr [1]>
# ...
Result
列现在包含根据需要应用于原始 data.frame
中每一行的 myFunc
的输出.您可以通过串联 tidyr :: unnest
操作来检索值.
Result
column now contains the output of myFunc
applied to each row in your original data.frame
, as desired. You can retrieve the values by concatenating a tidyr::unnest
operation.
df1 %>% mutate( Result = map(X, myFunc) ) %>% unnest
# # A tibble: 6 x 3
# Result columnA columnB
# <chr> <fctr> <fctr>
# 1 AZ A Z
# 2 BY B Y
# 3 CX C X
# ...
如果需要,可以将 unnest
限制为特定的列,例如 unnest(Result)
.
If desired, unnest
can be limited to specific columns, e.g., unnest(Result)
.
编辑:由于原始的 data.frame
仅包含两列,因此您实际上可以跳过 do
步骤并使用 purrr:: map2
代替.语法非常类似于 map
:
EDIT: Because your original data.frame
contains only two columns, you can actually skip the do
step and use purrr::map2
instead. The syntax is very similar to map
:
myFunc <- function( a, b ) {paste0(a,b)}
df %>% mutate( Result = map2( columnA, columnB, myFunc ) )
请注意,现在已将 myFunc
定义为二进制函数.
Note that myFunc
is now defined as a binary function.
这篇关于dplyr mutate-如何将一行作为函数参数传递?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!