如何将列名传递给函数dplyr [英] How to pass column names into a function dplyr
问题描述
我正在尝试创建一个简单的汇总功能,以加快R Markdown文件中使用的多列数据的报告。
I'm trying to create a simple summary function to speed up the reporting of multiple columns of data for use in a R Markdown file.
var1是分类的数据列,t_var是代表数据四分之一的整数,dt是完整数据。
var1 is a categorical column of data, t_var is an integer representing the quarter of data, and dt is the full data.
summarise_data_categorical <- function(var1, t_var, dt){
print(var1)
print(t_var)
#Select the columns to aggregate
group_func <- dt %>%
select(one_of(t_var, var1)) %>%
group_by(t_var,var1)
#create simple count summary
count_table <- group_func %>%
summarise(count = n()) %>%
spread(t_var, count)
#create a frequency version of the same table...
freq <- dt %>%
select(t_var, var1) %>%
group_by(t_var,var1) %>%
summarise(count = n()) %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)
#Present that table
freq_table <- freq %>%
spread(t_var, freq)
#Create the chart to do the same thing..
freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes(x=t_var, y = freq, colour=var1))
#Compile outputs as a list
results <- list(count_table, freq_table, freq_chart)
#Return list
results
}
说我有一个框架:
fr <- data.frame(lets = sample(LETTERS, 100, replace=TRUE),
`quarter type` = sample(1:4, 100, replace=TRUE))
如果我运行该函数,则:
If I run the function, thus:
summarise_data_categorical("lets", "quarter type", fr)
初始输出很有希望:
[1] "lets"
[1] "quarter type"
(注意:在尝试重新创建数据时,出于某种原因n我也收到警告:
(NOTE: in trying to recreate the data, for some reason I also receive the warning:
未知变量:四分之一类型
,
尽管这不是出现在我的原始数据中)
Unknown variables: quarter type
,
Although this doesn't appear in my original data)
主要是我遇到了错误:
Error in resolve_vars(new_groups, tbl_vars(.data)) : unknown variable to group by : t_var
来自Python,我对如何引用列还是有点困惑。有人可以解释我该如何解决自己的问题?
Having come from Python, I'm still a bit confused on how to refer to columns. Can someone explain how I can fix what I've got wrong?
推荐答案
我们可以使用开发版本的 dplyr
(不久将在0.6.0中发布)
We can use the new quosures from the devel version of dplyr
(soon to be released in 0.6.0)
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
}
summarise_data_categorical(lets, quartertype, fr)
#Source: local data frame [65 x 3]
#Groups: quartertype [?]
# quartertype lets count
# <int> <fctr> <int>
#1 1 A 1
#2 1 F 2
#3 1 G 2
#4 1 H 1
#5 1 I 1
#6 1 J 4
#7 1 M 3
#8 1 N 1
#9 1 P 1
#10 1 S 5
# ... with 55 more rows
enquo
具有类似的功能作为替代
从 base R
的基础,方法是采用输入参数并将其转换为 quosures
。 one_of
带有字符串参数,因此可以使用 quo_name
将等价单转换为字符串。在 group_by / summarise / mutate
等内部,我们可以通过取消引号( UQ
或 !!
)
The enquo
does a similar functionality as substitute
from base R
by taking the input arguments and convert it to quosures
. The one_of
takes a string argument, so quosures can be converted to string with quo_name
. Inside the group_by/summarise/mutate
etc, we can evaluate the quosure by unquote (UQ
or !!
)
担保
似乎可以与 dplyr
一起正常工作,尽管我们很难通过 tidyr
函数实现相同的功能。以下代码应适用于完整代码
The quosures
seems to be working fine with dplyr
though we have some difficulty in implementing the same with tidyr
functions. The following code should work for the full code
summarise_data_categorical <- function(var1, t_var, dt){
var1 <- enquo(var1)
t_var <- enquo(t_var)
v1 <- quo_name(var1)
v2 <- quo_name(t_var)
Summ_func <- dt %>%
select(one_of(v1, v2)) %>%
group_by(!!t_var, !!var1) %>%
summarise(count = n())
count_table <- Summ_func %>%
spread_(v2, "count")
freq <- Summ_func %>%
mutate(freq = round(count / sum(count),3)*100) %>%
select(-count)
freq_table <- freq %>%
spread_(v2, "freq")
freq_chart <- freq %>%
ggplot()+
geom_line(mapping=aes_string(x= v2 , y = "freq", colour= v1))
results <- list(count_table, freq_table, freq_chart)
results
}
summarise_data_categorical(lets, quartertype, fr)
#[[1]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <int> <int> <int> <int>
#1 A NA NA 1 2
#2 B 2 NA NA 1
#3 C 1 5 1 2
#4 E 1 1 NA NA
#5 G NA 1 2 2
#6 H 1 NA 1 1
#7 I NA 1 1 2
#8 J 2 1 1 1
#9 K 1 1 2 1
#10 L NA 2 NA NA
# ... with 14 more rows
#[[2]]
# A tibble: 24 × 5
# lets `1` `2` `3` `4`
#* <fctr> <dbl> <dbl> <dbl> <dbl>
#1 A NA NA 3.1 9.5
#2 B 8.7 NA NA 4.8
#3 C 4.3 20.8 3.1 9.5
#4 E 4.3 4.2 NA NA
#5 G NA 4.2 6.2 9.5
#6 H 4.3 NA 3.1 4.8
#7 I NA 4.2 3.1 9.5
#8 J 8.7 4.2 3.1 4.8
#9 K 4.3 4.2 6.2 4.8
#10 L NA 8.3 NA NA
## ... with 14 more rows
#[[3]]
这篇关于如何将列名传递给函数dplyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!