子组上的新列以及另一列中的百分比范围 [英] New columns on Subgroup and Range of percentage in another column
问题描述
我有一个示例df,如下所示:
I have a sample df like below:
df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13))
原始df很大,需要记住此df:
The original df is quite big and points to remember about this df:
- 只有两个组 Group1和 Group2
- 一个组下有多个子组,上面的df显示了一些子组
- 一个组+子组的总百分比总计为100%。在上面并不是因为它只是一个示例。因此,对于
Group1
,所有子组,例如A,B,C
等,总计为100&因此对于 Group2 。 Group1 和 Group2 的子组将大致相同
- There are only 2 Groups "Group1" and "Group2"
- There are multiple sub_groups under one group, the above df shows some of the sub groups
- The total % for a group + subgroup will add upto 100%. In the above it is not since it is just a sample. So, for
Group1
all subgroups likeA, B, C
etc. will add upto 100 & so for "Group2". Subgroups for both Group1 and Group2 will be more or less same
问:
我需要创建一个名为 Category
的列,该列可用于 Group.Name
级别的 Total%
。创建新列的条件是:
I need to create a column called Category
which lets works on range of Total%
on an Group.Name
level. The conditions for creating a new column are:
-
对于每个
Group.Name
只要Total%
最高,类别列就是Sub_group_name
名称所在的地方。
For every
Group.Name
whereeverTotal%
is highest, the category column is whatever theSub_group_name
name is.
对于每个 Group.Name
和 Total%
,类别列为 New_Group1 。
For every Group.Name
and Total%
between 10-30, the category column is "New_Group1".
对于每个 Group.Name
和 Total%
小于10,类别列为 New_Group2 。
For every Group.Name
and Total%
less than 10, the category column is "New_Group2".
预期产量:
df_output<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13),
"category"=c("A","A","New_Group1","New_Group1","New_Group2","New_Group1","New_Group1"))
推荐答案
使用 cut
来创建带有相应中断
标签 >,然后将每个 Group.Name中最高的总计替换为相应的 Sub_group_name
We can do this with cut
to create the labels
with the corresponding breaks
and then replace the 'Total.' that is the highest in each 'Group.Name' with the correspoding 'Sub_group_name'
library(dplyr)
df_test %>%
group_by(Group.Name) %>%
mutate(category = as.character(cut(`Total%`, breaks = c(-Inf,10, 30, Inf),
labels = c("New_Group2", "New_Group1", "Other"), right = FALSE)),
category = case_when(`Total%` == max(`Total%`) ~
Sub_group_name,
TRUE ~ category))
# A tibble: 7 x 4
# Groups: Group.Name [2]
# Group.Name Sub_group_name `Total%` category
# <chr> <chr> <dbl> <chr>
#1 Group1 A 35 A
#2 Group2 A 26 A
#3 Group1 B 10 New_Group1
#4 Group2 C 9 New_Group2
#5 Group2 D 5 New_Group2
#6 Group2 E 11 New_Group1
#7 Group1 C 13 New_Group1
数据
data
df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2",
"Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13), stringsAsFactors = FALSE,
check.names = FALSE)
这篇关于子组上的新列以及另一列中的百分比范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!