R帮助-在多个数据框列上起作用 [英] R help - function on multiple data frame columns
问题描述
我想使用一个函数在数据框中的四列上重复一组过程。最终,我需要一个包含所有输出的长数据帧。这是我的数据框:
I would like to use a function to repeat a set of procedures on four columns in a data frame. Ultimately I need a long data frame containing all the output. Here is my data frame:
> sample_data
# A tibble: 10 x 7
REVENUEID AMOUNT YEAR REPORT_CODE PAYMENT_METHOD INBOUND_CHANNEL AMOUNT_CAT
<chr> <dbl> <chr> <chr> <chr> <chr> <fctr>
1 rev-24985629 30 FY18 S Check Mail [25,50)
2 rev-22812413 1 FY16 Q Other Canvassing [0.01,10)
3 rev-23508794 100 FY17 Q Credit card Web [100,250)
4 rev-23506121 300 FY17 S Credit card Mail [250,500)
5 rev-23550444 100 FY17 S Credit card Web [100,250)
6 rev-21508672 25 FY14 J Check Mail [25,50)
7 rev-24981769 500 FY18 S Credit card Web [500,1e+03)
8 rev-23503684 50 FY17 R Check Mail [50,75)
9 rev-24982087 25 FY18 R Check Mail [25,50)
10 rev-24979834 50 FY18 R Credit card Web [50,75)
这是我的代码:
AMOUNT_CAT<- sample_data %>% group_by(AMOUNT_CAT,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=AMOUNT_CAT) %>% mutate(REPORT_CATEGORY="AMOUNT_CAT")
INBOUND_CHANNEL<- sample_data %>% group_by(INBOUND_CHANNEL,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=INBOUND_CHANNEL) %>% mutate(REPORT_CATEGORY="INBOUND_CHANNEL")
PAYMENT_METHOD<- sample_data %>% group_by(PAYMENT_METHOD,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=PAYMENT_METHOD) %>% mutate(REPORT_CATEGORY="PAYMENT_METHOD")
REPORT_CODE<- sample_data %>% group_by(REPORT_CODE,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=REPORT_CODE) %>% mutate(REPORT_CATEGORY="REPORT_CODE")
final_product<-bind_rows(REPORT_CODE,PAYMENT_METHOD,INBOUND_CHANNEL,AMOUNT_CAT)
这是该代码的最终产品:
Here is the final product of that code:
> final_product
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_CATEGORY REPORT_VALUE YEAR num total
<chr> <chr> <chr> <int> <dbl>
1 REPORT_CODE J FY14 1 25
2 REPORT_CODE Q FY16 1 1
3 REPORT_CODE Q FY17 1 100
4 REPORT_CODE R FY17 1 50
5 REPORT_CODE R FY18 2 75
6 REPORT_CODE S FY17 2 400
7 REPORT_CODE S FY18 2 530
8 PAYMENT_METHOD Check FY14 1 25
9 PAYMENT_METHOD Check FY17 1 50
10 PAYMENT_METHOD Check FY18 2 55
# ... with 17 more rows
这是我尝试将代码压缩为使其变得更智能,更高效(它不起作用):
Here is my attempt to condense the code to make it smarter and more efficient (it doesn't work):
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
repeat_procs <- lapply(cat.list, function(x) x <- sample_data %>% group_by(x,YEAR) %>% summarize(num=n(),total=sum(AMOUNT)) %>% rename(REPORT_VALUE=x) %>% mutate(REPORT_CATEGORY="x")
有人可以建议我如何编写不经常重复的更智能的代码吗?
Can someone please advise me on how to write "smarter" code that doesn't repeat as often?
谢谢!
推荐答案
您需要将字符串解析为符号( rlang :: sym
)并在<$ c中取消引用$ c> group_by 和重命名
,如下所示。另一点需要注意的是,您的 cat.list
已经是字符串向量,因此无需在 x $ c周围添加双引号。 $ c> in
mutate
:
You need to parse the strings to symbols (rlang::sym
) and unquote them in group_by
and rename
like the following. Another thing to note is that your cat.list
is already a string vector, so there is no need to add double quotes around x
in mutate
:
library(dplyr)
library(rlang)
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
repeat_procs <- lapply(cat.list, function(x){
final_data <- sample_data %>%
group_by(!!sym(x), YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE=!!sym(x)) %>%
mutate(REPORT_CATEGORY=x)
}) %>%
bind_rows()
结果:
> repeat_procs
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <fctr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows
这篇关于R帮助-在多个数据框列上起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!