For循环总结和dplyr的加入 [英] For-loop to summarize and joining by dplyr
问题描述
这是我的简化df:
GP_A <- c(rep("a",3),rep("b",2),rep("c",2))
GP_B <- c(rep("d",2),rep("e",4),rep("f",1))
GENDER <- c(rep("M",4),rep("F",3))
LOC <- c(rep("HK",2),rep("UK",3),rep("JP",2))
SCORE <- c(50,70,80,20,30,80,90)
df <- as.data.frame(cbind(GP_A,GP_B,GENDER,LOC,SCORE))
> df
GP_A GP_B GENDER LOC SCORE
1 a d M HK 50
2 a d M HK 70
3 a e M UK 80
4 b e M UK 20
5 b e F UK 30
6 c e F JP 80
7 c f F JP 90
我想按GP_A,GP_B或本示例中未显示的其他分组列来汇总分数。由于分组列的数量可能多达50个,因此我决定使用for循环来汇总得分。
I want to summarize the score by GP_A, GP_B, or other grouping columns which are not showing in this example. As the count of grouping columns might up to 50, I decided to use for-loop to summarize the score.
原始方法是将1组的得分汇总为一个:
The original method is summarizing the score with 1 group one by one:
GP_A_SCORE <- df %>% group_by(GP_A,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
GP_B_SCORE <- df %>% group_by(GP_B,GENDER,LOC) %>% summarize(SCORE=mean(SCORE))
...
我想要的是使用这样的for循环(无法运行):
What I want is using the for-loop like this (cannot run):
GP_list <- c("GP_A","GP_B",...)
LOC_list <- c("HK","UK","JP",...)
SCORE <- list()
for (i in GP_list){
for (j in LOC_list){
SCORE[[paste0(i,j)]] <- df %>% group_by(i,j,GENDER) %>% summarize(SCORE=mean(SCORE))
}}
在 group_by()中,变量被归类为字符,这是显示的错误:
As in "group_by()", the variables are classified as character and here is the error shown:
错误:列
I
,J
是u nknown
Error: Column
I
,J
is unknown
是否有任何方法可以强制R识别变量?
Is there any method to force R to recognize the variable?
我在dplyr的left_join上面临着同样的问题。
I am facing the same problem on the left_join of dplyr.
我正在执行类似操作时显示错误: left_join(x,y,by = c(i = i))
循环内。
Error is shown when I was doing something like: left_join(x,y,by=c(i=i))
inside a loop.
推荐答案
您可以获取长格式的数据,然后计算平均值
You could get the data in long format and then calculate the mean
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with('GP')) %>%
group_by(GENDER ,LOC, name, value) %>%
summarise(SCORE = mean(SCORE))
# GENDER LOC name value SCORE
# <fct> <fct> <chr> <fct> <dbl>
# 1 F JP GP_A c 85
# 2 F JP GP_B e 80
# 3 F JP GP_B f 90
# 4 F UK GP_A b 30
# 5 F UK GP_B e 30
# 6 M HK GP_A a 60
# 7 M HK GP_B d 60
# 8 M UK GP_A a 80
# 9 M UK GP_A b 20
#10 M UK GP_B e 50
这篇关于For循环总结和dplyr的加入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!