将组的最大值分配给该组中的所有行 [英] Assign max value of group to all rows in that group
问题描述
我想将一个组的最大值分配给该组内的所有行.我该怎么办?
I would like to assign the max value of a group to all rows within that group. How do I do that?
我有一个数据框,其中包含该组的名称和所属组的最大学分数.
I have a dataframe containing the names of the group and the max number of credits that belongs to it.
course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max)
给出
Course Credits
1 ABC1000 6.5
2 ABC1003 6.5
3 ABC1004 6.5
4 ABC1007 5.0
5 ABC1010 6.5
6 ABC1021 6.5
7 ABC1023 6.5
主数据框如下所示:
Appraisal.Type Resits Credits Course_code Student_ID
Final result 0 6.5 ABC1000 10
Final result 0 6.5 ABC1003 10
Grade supervisor 0 0 ABC1000 10
Grade supervisor 0 0 ABC1003 10
Final result 0 12 ABC1294 23
Grade supervisor 0 0 ABC1294 23
如您所见,学生10修了ABC1000课程,价值6.5学分.但是,对于每门课程(每位学生),都有两行:最终结果和年级主管.最后,应删除最终结果,但应保留功劳.因此,我想将最大值6.5分配给成绩主管"行.同样,学生23已修读ABC1294课程,价值12个学分.
As you see, student 10 took course ABC1000, worth 6.5 credits. For each course (per student), however, two rows exist: Final result and Grade supervisor. In the end, Final result should be deleted, but the credits should be kept. Therefore, I want to assign the max value of 6.5 to the Grade supervisor row. Likewise, student 23 has followed course ABC1294, worth 12 credits.
最后,应该是结果:
Appraisal.Type Resits Credits Course_code Student_ID
Grade supervisor 0 6.5 ABC1000 10
Grade supervisor 0 6.5 ABC1003 10
Grade supervisor 0 12 ABC1294 23
我该怎么办?
推荐答案
一种选择是按'Student_ID'分组, mutate
将'Credits'的 max
'Credits'和 filter
具有"Appraisal.Type"作为"Grade Supervisor"的行
An option would be to group by 'Student_ID', mutate
the 'Credits' with max
of 'Credits' and filter
the rows with 'Appraisal.Type' as "Grade supervisor"
library(dplyr)
df1 %>%
group_by(Student_ID) %>%
dplyr::mutate(Credits = max(Credits)) %>%
ungroup %>%
filter(Appraisal.Type == "Grade supervisor")
# A tibble: 2 x 5
# Appraisal.Type Resits Credits Course_code Student_ID
# <chr> <int> <dbl> <chr> <int>
#1 Grade supervisor 0 6.5 ABC1000 10
#2 Grade supervisor 0 6.5 ABC1003 10
如果我们还需要在分组中包含课程代码"
If we also need 'Course_code' to be included in the grouping
df2 %>%
group_by(Student_ID, Course_code) %>%
dplyr::mutate(Credits = max(Credits)) %>%
filter(Appraisal.Type == "Grade supervisor")
# A tibble: 3 x 5
# Groups: Student_ID, Course_code [3]
# Appraisal.Type Resits Credits Course_code Student_ID
# <chr> <int> <dbl> <chr> <int>
#1 Grade supervisor 0 6.5 ABC1000 10
#2 Grade supervisor 0 6.5 ABC1003 10
#3 Grade supervisor 0 12 ABC1294 23
注意:在这种情况下,还加载了 plyr
程序包,在 plyr 中也可能存在一些对功能esp
summarise/mutate
的屏蔽.代码>.为了防止这种情况,请在不加载 plyr
的情况下在新会话中执行此操作,或者明确指定 dplyr :: mutate
NOTE: I case, plyr
package is also loaded, there can be some masking of functions esp summarise/mutate
which is also found in plyr
. To prevent it, either do this on a fresh session without loading plyr
or explicitly specify dplyr::mutate
df1 <- structure(list(Appraisal.Type = c("Final result", "Final result",
"Grade supervisor", "Grade supervisor"), Resits = c(0L, 0L, 0L,
0L), Credits = c(6.5, 6.5, 0, 0), Course_code = c("ABC1000",
"ABC1003", "ABC1000", "ABC1003"), Student_ID = c(10L, 10L, 10L,
10L)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(Appraisal.Type = c("Final result", "Final result",
"Grade supervisor", "Grade supervisor", "Final result", "Grade supervisor"
), Resits = c(0L, 0L, 0L, 0L, 0L, 0L), Credits = c(6.5, 6.5,
0, 0, 12, 0), Course_code = c("ABC1000", "ABC1003", "ABC1000",
"ABC1003", "ABC1294", "ABC1294"), Student_ID = c(10L, 10L, 10L,
10L, 23L, 23L)), class = "data.frame", row.names = c(NA, -6L))
这篇关于将组的最大值分配给该组中的所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!