dplyr mutate-如何将一行作为函数参数传递? [英] dplyr mutate - How do I pass one row as a function argument?

查看:58
本文介绍了dplyr mutate-如何将一行作为函数参数传递?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试在自己的标题中创建一个新列,该列将收集并格式化在所有其他列中找到的所有单词.如果可能,我想使用dplyr进行此操作.原始数据框:

I'm trying to create a new column in my tibble which collects and formats all words found in all other columns. I would like to do this using dplyr, if possible. Original DataFrame:

df <- read.table(text =      "  columnA     columnB      
                 1            A           Z                    
                 2            B           Y                    
                 3            C           X                    
                 4            D           W                    
                 5            E           V                   
                 6            F           U            "  ) 

作为一个简化的示例,我希望做类似的事情:

As a simplified example, I am hoping to do something like:

df %>%
    rowwise() %>%
    mutate(newColumn = myFunc(.))

并使输出看起来像这样:

And have the output look like this:

       columnA     columnB      newColumn
1            A           Z             AZ        
2            B           Y             BY        
3            C           X             CX        
4            D           W             DW        
5            E           V             EV        
6            F           U             FU       

当我在代码中尝试此操作时,输出如下:

When I try this in my code, the output looks like:

       columnA     columnB      newColumn
1            A           Z             ABCDEF        
2            B           Y             ABCDEF        
3            C           X             ABCDEF    
4            D           W             ABCDEF    
5            E           V             ABCDEF    
6            F           U             ABCDEF

myFunc应该以一行作为参数,但是当我尝试使用rowwise()时,我似乎正在将整个小节传递到函数中(我可以通过将打印函数添加到myFunc中来看到这一点).

myFunc should take one row as an argument but when I try using rowwise() I seem to be passing the entire tibble into the function (I can see this from adding a print function into myFunc).

我如何只传递一行并迭代地执行此操作,以便它将函数应用于每一行?可以使用dplyr吗?

How can I pass just one row and do this iteratively so that it applies the function to every row? Can this be done with dplyr?

myFunc已简化.实际功能如下:

myFunc in the example is simplified for the sake of my question. The actual function looks like this:

get_chr_vector <- function(row) {

    row <- row[,2:ncol(row)] # I need to skip the first row
    words <- str_c(row, collapse = ' ')
    words <- str_to_upper(words)
    words <- unlist(str_split(words, ' '))
    words <- words[words != '']
    words <- words[!nchar(words) <= 2]
    words <- removeWords(words, stopwords_list) # from the tm library
    words <- paste(words, sep = ' ', collapse = ' ')
}

推荐答案

看看?dplyr :: do ?purrr :: map 将任意函数应用于任意列,并通过多个一元运算符链接结果.例如,

Take a look at ?dplyr::do and ?purrr::map, which allow you to apply arbitrary functions to arbitrary columns and to chain the results through multiple unary operators. For example,

df1 <- df %>% rowwise %>% do( X = as_data_frame(.) ) %>% ungroup
# # A tibble: 6 x 1
#                  X
# *           <list>
# 1 <tibble [1 x 2]>
# 2 <tibble [1 x 2]>
# ...

请注意,列 X 现在包含1x2个 data.frame s(或 tibble s),其中包含来自原始数据的行.frame .现在,您可以使用 map 将每一个传递给自定义的 myFunc .

Notice that column X now contains 1x2 data.frames (or tibbles) comprised of rows from your original data.frame. You can now pass each one to your custom myFunc using map.

myFunc <- function(Y) {paste0( Y$columnA, Y$columnB )}
df1 %>% mutate( Result = map(X, myFunc) )
# # A tibble: 6 x 2
#                  X    Result
#             <list>    <list>
# 1 <tibble [1 x 2]> <chr [1]>
# 2 <tibble [1 x 2]> <chr [1]>
# ...

Result 列现在包含根据需要应用于原始 data.frame 中每一行的 myFunc 的输出.您可以通过串联 tidyr :: unnest 操作来检索值.

Result column now contains the output of myFunc applied to each row in your original data.frame, as desired. You can retrieve the values by concatenating a tidyr::unnest operation.

df1 %>% mutate( Result = map(X, myFunc) ) %>% unnest
# # A tibble: 6 x 3
#   Result columnA columnB
#    <chr>  <fctr>  <fctr>
# 1     AZ       A       Z
# 2     BY       B       Y
# 3     CX       C       X
# ...

如果需要,可以将 unnest 限制为特定的列,例如 unnest(Result).

If desired, unnest can be limited to specific columns, e.g., unnest(Result).

编辑:由于原始的 data.frame 仅包含两列,因此您实际上可以跳过 do 步骤并使用 purrr:: map2 代替.语法非常类似于 map :

EDIT: Because your original data.frame contains only two columns, you can actually skip the do step and use purrr::map2 instead. The syntax is very similar to map:

myFunc <- function( a, b ) {paste0(a,b)}
df %>% mutate( Result = map2( columnA, columnB, myFunc ) )

请注意,现在已将 myFunc 定义为二进制函数.

Note that myFunc is now defined as a binary function.

这篇关于dplyr mutate-如何将一行作为函数参数传递?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆