r-如何在使用dplyr的自定义函数上使用迭代 [英] r- How to use iteration on a custom function that uses dplyr

查看:88
本文介绍了r-如何在使用dplyr的自定义函数上使用迭代的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个自定义函数来计算包含100多个列的大型数据集中的分组百分比.因为我有很多列,所以我想执行一个循环或套用或者避免将函数键入超过100次的方法.当我为每一列分别键入时,我编写的函数可以正常工作,但是我无法弄清楚如何重复执行该函数.

I want to create a custom function to calculate grouped percentages in a large dataset with 100+ columns. Because I have so many columns I want to do a loop or lapply or something to avoid typing the function out 100+ times. The function I wrote works fine when I type it in individually for each column, but I cannot figure out how to do it repeatedly.

这是一个简化的数据框和功能:

Here's a simplified dataframe and function:

# load required libraries:
library(tidyverse)

df<-data.frame(sex=c('M','M','M','F','M','F','M',NA),
              school=c('A','A','A','A','B','B','B',NA),
              question1=c(NA,1,1,2,2,3,3,3),
              question2=c(2,NA,2,4,5,1,2,3))

 my_function<-function(dataset,question_number){

  question_number_enquo<-enquo(question_number)

  dataset%>%
    filter(!is.na(!!question_number_enquo)&!is.na(sex))%>%
    group_by(school,sex,!!question_number_enquo)%>%
    count(!!question_number_enquo)%>%
    summarise(number=sum(n))%>%
    mutate(percent=number/sum(number)*100)%>%
    ungroup()
}

当我在其中输入列名时,我的函数将起作用:

My function works when I type a column name into it:

my_function(df,question1)

 A tibble: 5 x 5
  school sex   question1 number percent
  <fct>  <fct>     <dbl>  <int>   <dbl>
1 A      F             2      1     100
2 A      M             1      2     100
3 B      F             3      1     100
4 B      M             2      1      50
5 B      M             3      1      50

这是我在重复方面所做的尝试.我想对每列重复该功能(学校和性别除外,因为这些是我的小组).

Here's what I've tried in terms of reiteration. I want to repeat the function for every column (except for school and sex, because those are my groups).

question_col_names<-(df%>%select(-sex,-school)%>%colnames())

将lapply与列名一起使用:

Using lapply with the column names as a quosure:

question_col_names_enquo<-enquo(question_col_names)
lapply(df,my_function(df,!!question_col_names_enquo))


 Error: Column `<chr>` must be length 7 (the number of rows) or one, not 2

尝试使用不带引号的列名:

Trying lapply with unquoted column names:

lapply(df,my_function(df,question_col_names))

Error: Column `question_col_names` is unknown

尝试用引号引起来的列名:

Trying lapply with quoted column names:

lapply(df,my_function(df,'question_col_names'))

Error: Column `"question_col_names"` can't be modified because it's a grouping variable

我也尝试申请,并且得到了相同类型的错误消息:

I also tried apply, and got the same types of error messages:

apply(df,1,my_function(df,!!question_col_names_enquo))
Error: Column `<chr>` must be length 7 (the number of rows) or one, not 2

apply(df,1,my_function(df,question_col_names))
Error: Column `question_col_names` is unknown

apply(df,1,my_function(df,'question_col_names'))
Error: Column `"question_col_names"` can't be modified because it's a grouping variable

我还尝试了for循环的不同变体:

I also tried different variations of a for loop:

for (i in question_col_names){
  my_function(df,i)
}

Error: Column `i` is unknown


for (i in question_col_names){
   my_function(df,'i')
 }
Error: Column `"i"` can't be modified because it's a grouping variable

如何使用迭代使函数在所有列上重复?

How can I use iteration to get my function to repeat over all my columns?

我怀疑这与dplyr有关;我知道它在自定义函数中表现得很有趣,但是我可以让它在我的函数中工作,而不是在迭代中.我已经在Google和Stack Overflow上进行了深入研究,但没有找到任何答案.

I suspect that this has to do with dplyr; I know that it acts funny in custom functions, but I can get it to work in my function, just not in the iteration. I've done a deep dive on Google and Stack Overflow but haven't found anything that answered this.

提前谢谢!

推荐答案

您的question_col_names是字符串.您需要sym将字符串转换为函数内部的变量

Your question_col_names are strings. You need sym to convert string to variable inside your function instead

library(tidyverse)

df <- data.frame(
  sex = c("M", "M", "M", "F", "M", "F", "M", NA),
  school = c("A", "A", "A", "A", "B", "B", "B", NA),
  question1 = c(NA, 1, 1, 2, 2, 3, 3, 3),
  question2 = c(2, NA, 2, 4, 5, 1, 2, 3)
)

my_function <- function(dataset, question_number) {
  question_number_enquo <- sym(question_number)

  dataset %>%
    filter(!is.na(!!question_number_enquo) & !is.na(sex)) %>%
    group_by(school, sex, !!question_number_enquo) %>%
    count(!!question_number_enquo) %>%
    summarise(number = sum(n)) %>%
    mutate(percent = number / sum(number) * 100) %>%
    ungroup()
}

my_function(df, "question1")
#> # A tibble: 5 x 5
#>   school sex   question1 number percent
#>   <fct>  <fct>     <dbl>  <int>   <dbl>
#> 1 A      F             2      1     100
#> 2 A      M             1      2     100
#> 3 B      F             3      1     100
#> 4 B      M             2      1      50
#> 5 B      M             3      1      50

question_col_names <- (df %>% select(-sex, -school) %>% colnames())

result <- map_df(question_col_names, ~ my_function(df, .x))
result
#> # A tibble: 10 x 6
#>    school sex   question1 number percent question2
#>    <fct>  <fct>     <dbl>  <int>   <dbl>     <dbl>
#>  1 A      F             2      1     100        NA
#>  2 A      M             1      2     100        NA
#>  3 B      F             3      1     100        NA
#>  4 B      M             2      1      50        NA
#>  5 B      M             3      1      50        NA
#>  6 A      F            NA      1     100         4
#>  7 A      M            NA      2     100         2
#>  8 B      F            NA      1     100         1
#>  9 B      M            NA      1      50         2
#> 10 B      M            NA      1      50         5

如果将函数结果转换为长格式,可能会更好

Probably better if you convert your function result to long format

my_function2 <- function(dataset, question_number) {
  question_number_enquo <- sym(question_number)

  res <- dataset %>%
    filter(!is.na(!!question_number_enquo) & !is.na(sex)) %>%
    group_by(school, sex, !!question_number_enquo) %>%
    count(!!question_number_enquo) %>%
    summarise(number = sum(n)) %>%
    mutate(percent = number / sum(number) * 100) %>%
    ungroup() %>% 
    gather(key = 'question', value, -school, -sex, -number, -percent)
  return(res)

}

result2 <- map_df(question_col_names, ~ my_function2(df, .x))
result2
#> # A tibble: 10 x 6
#>    school sex   number percent question  value
#>    <fct>  <fct>  <int>   <dbl> <chr>     <dbl>
#>  1 A      F          1     100 question1     2
#>  2 A      M          2     100 question1     1
#>  3 B      F          1     100 question1     3
#>  4 B      M          1      50 question1     2
#>  5 B      M          1      50 question1     3
#>  6 A      F          1     100 question2     4
#>  7 A      M          2     100 question2     2
#>  8 B      F          1     100 question2     1
#>  9 B      M          1      50 question2     2
#> 10 B      M          1      50 question2     5

reprex软件包(v0.3.0)于2019-11-25创建 sup>

Created on 2019-11-25 by the reprex package (v0.3.0)

这篇关于r-如何在使用dplyr的自定义函数上使用迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆