子组上的新列以及另一列中的百分比范围 [英] New columns on Subgroup and Range of percentage in another column

查看:63
本文介绍了子组上的新列以及另一列中的百分比范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个示例df,如下所示:

I have a sample df like below:

df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
                "Sub_group_name"=c("A","A","B","C","D","E","C"),
                "Total%"=c(35,26,10,9,5,11,13))

原始df很大,需要记住此df:

The original df is quite big and points to remember about this df:


  • 只有两个组 Group1和 Group2

  • 一个组下有多个子组,上面的df显示了一些子组

  • 一个组+子组的总百分比总计为100%。在上面并不是因为它只是一个示例。因此,对于 Group1 ,所有子组,例如 A,B,C 等,总计为100&因此对于 Group2 Group1 Group2 的子组将大致相同

  • There are only 2 Groups "Group1" and "Group2"
  • There are multiple sub_groups under one group, the above df shows some of the sub groups
  • The total % for a group + subgroup will add upto 100%. In the above it is not since it is just a sample. So, for Group1all subgroups like A, B, C etc. will add upto 100 & so for "Group2". Subgroups for both Group1 and Group2 will be more or less same

问:

我需要创建一个名为 Category 的列,该列可用于 Group.Name 级别的 Total%。创建新列的条件是:

I need to create a column called Category which lets works on range of Total% on an Group.Name level. The conditions for creating a new column are:


  • 对于每个 Group.Name 只要 Total%最高,类别列就是 Sub_group_name 名称所在的地方。

  • For every Group.Name whereever Total% is highest, the category column is whatever the Sub_group_name name is.

对于每个 Group.Name Total% ,类别列为 New_Group1

For every Group.Name and Total% between 10-30, the category column is "New_Group1".

对于每个 Group.Name Total%小于10,类别列为 New_Group2

For every Group.Name and Total% less than 10, the category column is "New_Group2".

预期产量:

df_output<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
                     "Sub_group_name"=c("A","A","B","C","D","E","C"),
                     "Total%"=c(35,26,10,9,5,11,13),
                     "category"=c("A","A","New_Group1","New_Group1","New_Group2","New_Group1","New_Group1"))


推荐答案

使用 cut 来创建带有相应中断标签 >,然后将每个 Group.Name中最高的总计替换为相应的 Sub_group_name

We can do this with cut to create the labels with the corresponding breaks and then replace the 'Total.' that is the highest in each 'Group.Name' with the correspoding 'Sub_group_name'

library(dplyr)
df_test %>% 
  group_by(Group.Name) %>%
  mutate(category = as.character(cut(`Total%`, breaks = c(-Inf,10, 30, Inf), 
          labels = c("New_Group2", "New_Group1", "Other"), right = FALSE)), 
         category = case_when(`Total%` == max(`Total%`) ~ 
                          Sub_group_name,
                                   TRUE ~ category))
# A tibble: 7 x 4
# Groups:   Group.Name [2]
#  Group.Name Sub_group_name `Total%` category  
#  <chr>      <chr>             <dbl> <chr>     
#1 Group1     A                    35 A         
#2 Group2     A                    26 A         
#3 Group1     B                    10 New_Group1
#4 Group2     C                     9 New_Group2
#5 Group2     D                     5 New_Group2
#6 Group2     E                    11 New_Group1
#7 Group1     C                    13 New_Group1



数据



data

df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2",
        "Group2","Group1"),
             "Sub_group_name"=c("A","A","B","C","D","E","C"),
          "Total%"=c(35,26,10,9,5,11,13), stringsAsFactors = FALSE, 
              check.names = FALSE)

这篇关于子组上的新列以及另一列中的百分比范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆