如何将一列分隔为多列(复杂列) [英] How to separate one column to multiple column (complex column)
本文介绍了如何将一列分隔为多列(复杂列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试根据其主题和等级将等级列分为多个列
grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",sep=";")
# Rename the column names
names(grade)<-c("Student_ID","Name","Venue","Grade")
head(grade)
# Separate `Grade` into `subject` variables and coresponding `Grade`columns
library(tidyverse)
df<- grade %>% separate(Grade,paste("V",1:7,sep="_"),sep=":")
head(df)
# It still is not separating `subject ` and `grade` independently
# Here is what I want it to look like
new_df<-df[c(1:5),c(1:4)]
new_df<-data.frame(new_df, V2=c(1:5)) # the same for V2,4,5,6,,7 to separate subject and grade
new_df
我正在尝试使用dplyr a nd stringr,但无法产生预期的结果
推荐答案
在下面的解决方案中,我使用了函数来自 tidyverse
和 rebus
软件包。 rebus
包使用人类可读的代码逐段构建正则表达式。
In my solution below, I have used functions from the tidyverse
and rebus
packages. The rebus
package builds regular expressions piece by piece using human readable code.
library(tidyverse)
library(rebus)
grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",
sep = ";", stringsAsFactors = FALSE)
grade_new <- grade %>%
mutate(DIEM_THI2 = str_replace_all(DIEM_THI, pattern = ":" %R% one_or_more(SPC), "-")) %>%
separate_rows(DIEM_THI2, sep = one_or_more(SPC)) %>%
separate(DIEM_THI2, c("SUBJECT", "GRADE"), sep = "-") %>%
spread(SUBJECT,GRADE)
结果数据帧如下所示:
head(grade_new[,5:12])
# Biology Chemitry English Geography History Literature Math Physics
# 1 6.00 6.00 <NA> <NA> <NA> 7.50 4.25 6.80
# 2 5.80 6.00 <NA> <NA> <NA> 6.00 5.75 <NA>
# 3 <NA> <NA> <NA> 8.00 4.50 7.75 2.25 <NA>
# 4 <NA> <NA> <NA> 7.25 7.50 7.75 3.25 <NA>
# 5 <NA> <NA> <NA> 7.75 4.50 8.25 1.75 <NA>
# 6 <NA> 6.60 6.78 <NA> <NA> 7.00 8.75 8.40
代码可以理解如下:
- 所有冒号和空格子字符串均替换为连字符。即
数学:4.25文学:7.50
变为数学4.25文学7.50
。这是通过str_replace_all
函数完成的。让我们调用新变量DIEM_THI2
。 -
separate_rows
函数可拆分空间-分隔的列,DIEM_THI2
进入单独的行,即Math-4.25
和Literature- 7.50
跨两行。 -
DIEM_THI2
列分为两列,即SUBJECT
和GRADE
,其中前者包含类似Math
的值,文学
,后者包含类似4.25
和7.50
。 - 键值对或SUBJECT-GRADE对分布在多个列中。
- All colon+space substrings are replaced with hyphens. i.e.
"Math: 4.25 Literature: 7.50"
becomes"Math-4.25 Literature-7.50"
. This is done using thestr_replace_all
function. Lets call the new variableDIEM_THI2
. - The
separate_rows
function splits the space-separated column,DIEM_THI2
into separate rows i.e."Math-4.25"
and"Literature-7.50"
span over two different rows. - The
DIEM_THI2
column is separated into two columns, i.e.SUBJECT
andGRADE
where the former contains values like"Math"
,"Literature"
and the latter contains values like"4.25"
and"7.50"
. - The key-value pair or SUBJECT-GRADE pair are spread across multiple columns.
这篇关于如何将一列分隔为多列(复杂列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文