如何将列名传递给函数dplyr [英] How to pass column names into a function dplyr

查看:89
本文介绍了如何将列名传递给函数dplyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个简单的汇总功能,以加快R Markdown文件中使用的多列数据的报告。

I'm trying to create a simple summary function to speed up the reporting of multiple columns of data for use in a R Markdown file.

var1是分类的数据列,t_var是代表数据四分之一的整数,dt是完整数据。

var1 is a categorical column of data, t_var is an integer representing the quarter of data, and dt is the full data.

summarise_data_categorical <- function(var1, t_var, dt){

  print(var1)
  print(t_var)

  #Select the columns to aggregate
  group_func <- dt %>% 
    select(one_of(t_var, var1)) %>%
    group_by(t_var,var1)

  #create simple count summary
  count_table <- group_func %>%
    summarise(count = n()) %>%
    spread(t_var, count)

  #create a frequency version of the same table...
  freq <- dt %>%
    select(t_var, var1) %>%
    group_by(t_var,var1) %>%
    summarise(count = n()) %>%
    mutate(freq = round(count / sum(count),3)*100) %>%
    select(-count)

  #Present that table
  freq_table <- freq %>%
    spread(t_var, freq)

  #Create the chart to do the same thing..
  freq_chart <- freq %>%
    ggplot()+
    geom_line(mapping=aes(x=t_var, y = freq, colour=var1))

  #Compile outputs as a list
  results <- list(count_table, freq_table, freq_chart)

  #Return list
  results

}

说我有一个框架:

fr <- data.frame(lets = sample(LETTERS, 100, replace=TRUE),
           `quarter type` = sample(1:4, 100, replace=TRUE))

如果我运行该函数,则:

If I run the function, thus:

summarise_data_categorical("lets", "quarter type", fr)

初始输出很有希望:

[1] "lets"
[1] "quarter type"

(注意:在尝试重新创建数据时,出于某种原因n我也收到警告:

(NOTE: in trying to recreate the data, for some reason I also receive the warning:

未知变量:四分之一类型
尽管这不是出现在我的原始数据中)

Unknown variables: quarter type, Although this doesn't appear in my original data)

主要是我遇到了错误:

Error in resolve_vars(new_groups, tbl_vars(.data)) : unknown variable to group by : t_var

来自Python,我对如何引用列还是有点困惑。有人可以解释我该如何解决自己的问题?

Having come from Python, I'm still a bit confused on how to refer to columns. Can someone explain how I can fix what I've got wrong?

推荐答案

我们可以使用开发版本的 dplyr (不久将在0.6.0中发布)

We can use the new quosures from the devel version of dplyr (soon to be released in 0.6.0)

summarise_data_categorical <- function(var1, t_var, dt){

  var1 <- enquo(var1)
  t_var <- enquo(t_var)
  v1 <- quo_name(var1)
  v2 <- quo_name(t_var) 

  dt %>%
    select(one_of(v1, v2)) %>%
    group_by(!!t_var, !!var1) %>%
    summarise(count = n()) 

}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]

#   quartertype   lets count
#         <int> <fctr> <int>
#1            1      A     1
#2            1      F     2
#3            1      G     2
#4            1      H     1
#5            1      I     1
#6            1      J     4
#7            1      M     3
#8            1      N     1
#9            1      P     1
#10           1      S     5
# ... with 55 more rows

enquo 具有类似的功能作为替代 base R 的基础,方法是采用输入参数并将其转换为 quosures one_of 带有字符串参数,因此可以使用 quo_name 将等价单转换为字符串。在 group_by / summarise / mutate 等内部,我们可以通过取消引号( UQ !!

The enquo does a similar functionality as substitute from base R by taking the input arguments and convert it to quosures. The one_of takes a string argument, so quosures can be converted to string with quo_name. Inside the group_by/summarise/mutate etc, we can evaluate the quosure by unquote (UQ or !!)

担保似乎可以与 dplyr 一起正常工作,尽管我们很难通过 tidyr 函数实现相同的功能。以下代码应适用于完整代码

The quosures seems to be working fine with dplyr though we have some difficulty in implementing the same with tidyr functions. The following code should work for the full code

 summarise_data_categorical <- function(var1, t_var, dt){

  var1 <- enquo(var1)
  t_var <- enquo(t_var)

  v1 <- quo_name(var1)
  v2 <- quo_name(t_var) 

  Summ_func <- dt %>%
                    select(one_of(v1, v2)) %>%
                  group_by(!!t_var, !!var1) %>%
                    summarise(count = n())

   count_table <- Summ_func %>%
                  spread_(v2, "count") 

   freq <-  Summ_func %>%
                  mutate(freq = round(count / sum(count),3)*100) %>%
              select(-count)

   freq_table <- freq %>%
                    spread_(v2, "freq")

   freq_chart <- freq %>%
             ggplot()+
               geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1)) 

   results <- list(count_table, freq_table, freq_chart)
   results

    }
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
#     lets   `1`   `2`   `3`   `4`
#*  <fctr> <int> <int> <int> <int>
#1       A    NA    NA     1     2
#2       B     2    NA    NA     1
#3       C     1     5     1     2
#4       E     1     1    NA    NA
#5       G    NA     1     2     2
#6       H     1    NA     1     1
#7       I    NA     1     1     2
#8       J     2     1     1     1
#9       K     1     1     2     1
#10      L    NA     2    NA    NA
# ... with 14 more rows

#[[2]]
# A tibble: 24 × 5
#     lets   `1`   `2`   `3`   `4`
#*  <fctr> <dbl> <dbl> <dbl> <dbl>
#1       A    NA    NA   3.1   9.5
#2       B   8.7    NA    NA   4.8
#3       C   4.3  20.8   3.1   9.5
#4       E   4.3   4.2    NA    NA
#5       G    NA   4.2   6.2   9.5
#6       H   4.3    NA   3.1   4.8
#7       I    NA   4.2   3.1   9.5
#8       J   8.7   4.2   3.1   4.8
#9       K   4.3   4.2   6.2   4.8
#10      L    NA   8.3    NA    NA
## ... with 14 more rows

#[[3]]

这篇关于如何将列名传递给函数dplyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆