将组的最大值分配给该组中的所有行 [英] Assign max value of group to all rows in that group

查看:59
本文介绍了将组的最大值分配给该组中的所有行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个组的最大值分配给该组内的所有行.我该怎么办?

I would like to assign the max value of a group to all rows within that group. How do I do that?

我有一个数据框,其中包含该组的名称和所属组的最大学分数.

I have a dataframe containing the names of the group and the max number of credits that belongs to it.

course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max)

给出

    Course    Credits
1   ABC1000  6.5
2   ABC1003  6.5
3   ABC1004  6.5
4   ABC1007  5.0
5   ABC1010  6.5
6   ABC1021  6.5
7   ABC1023  6.5

主数据框如下所示:

Appraisal.Type   Resits   Credits Course_code   Student_ID          
Final result       0       6.5    ABC1000           10                
Final result       0       6.5    ABC1003           10               
Grade supervisor   0       0      ABC1000           10               
Grade supervisor   0       0      ABC1003           10 
Final result       0       12     ABC1294           23   
Grade supervisor   0       0      ABC1294           23     

如您所见,学生10修了ABC1000课程,价值6.5学分.但是,对于每门课程(每位学生),都有两行:最终结果和年级主管.最后,应删除最终结果,但应保留功劳.因此,我想将最大值6.5分配给成绩主管"行.同样,学生23已修读ABC1294课程,价值12个学分.

As you see, student 10 took course ABC1000, worth 6.5 credits. For each course (per student), however, two rows exist: Final result and Grade supervisor. In the end, Final result should be deleted, but the credits should be kept. Therefore, I want to assign the max value of 6.5 to the Grade supervisor row. Likewise, student 23 has followed course ABC1294, worth 12 credits.

最后,应该是结果:

Appraisal.Type   Resits   Credits Course_code   Student_ID                      
Grade supervisor   0       6.5      ABC1000           10               
Grade supervisor   0       6.5      ABC1003           10    
Grade supervisor   0       12       ABC1294           23               

我该怎么办?

推荐答案

一种选择是按'Student_ID'分组, mutate 将'Credits'的 max 'Credits'和 filter 具有"Appraisal.Type"作为"Grade Supervisor"的行

An option would be to group by 'Student_ID', mutate the 'Credits' with max of 'Credits' and filter the rows with 'Appraisal.Type' as "Grade supervisor"

library(dplyr)
df1 %>%
   group_by(Student_ID) %>%
   dplyr::mutate(Credits = max(Credits)) %>%
   ungroup %>%
   filter(Appraisal.Type == "Grade supervisor")
# A tibble: 2 x 5
#  Appraisal.Type   Resits Credits Course_code Student_ID
#  <chr>             <int>   <dbl> <chr>            <int>
#1 Grade supervisor      0     6.5 ABC1000             10
#2 Grade supervisor      0     6.5 ABC1003             10


如果我们还需要在分组中包含课程代码"


If we also need 'Course_code' to be included in the grouping

df2 %>%
  group_by(Student_ID, Course_code) %>% 
  dplyr::mutate(Credits = max(Credits)) %>%  
  filter(Appraisal.Type == "Grade supervisor")
# A tibble: 3 x 5
# Groups:   Student_ID, Course_code [3]
#  Appraisal.Type   Resits Credits Course_code Student_ID
#  <chr>             <int>   <dbl> <chr>            <int>
#1 Grade supervisor      0     6.5 ABC1000             10
#2 Grade supervisor      0     6.5 ABC1003             10
#3 Grade supervisor      0    12   ABC1294             23

注意:在这种情况下,还加载了 plyr 程序包,在 plyr 中也可能存在一些对功能esp summarise/mutate 的屏蔽.代码>.为了防止这种情况,请在不加载 plyr 的情况下在新会话中执行此操作,或者明确指定 dplyr :: mutate

NOTE: I case, plyr package is also loaded, there can be some masking of functions esp summarise/mutate which is also found in plyr. To prevent it, either do this on a fresh session without loading plyr or explicitly specify dplyr::mutate

df1 <- structure(list(Appraisal.Type = c("Final result", "Final result", 
"Grade supervisor", "Grade supervisor"), Resits = c(0L, 0L, 0L, 
0L), Credits = c(6.5, 6.5, 0, 0), Course_code = c("ABC1000", 
"ABC1003", "ABC1000", "ABC1003"), Student_ID = c(10L, 10L, 10L, 
10L)), class = "data.frame", row.names = c(NA, -4L)) 



df2 <- structure(list(Appraisal.Type = c("Final result", "Final result", 
"Grade supervisor", "Grade supervisor", "Final result", "Grade supervisor"
), Resits = c(0L, 0L, 0L, 0L, 0L, 0L), Credits = c(6.5, 6.5, 
0, 0, 12, 0), Course_code = c("ABC1000", "ABC1003", "ABC1000", 
"ABC1003", "ABC1294", "ABC1294"), Student_ID = c(10L, 10L, 10L, 
10L, 23L, 23L)), class = "data.frame", row.names = c(NA, -6L))

这篇关于将组的最大值分配给该组中的所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆