如何将一列分隔为多列(复杂列) [英] How to separate one column to multiple column (complex column)

查看:113
本文介绍了如何将一列分隔为多列(复杂列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据其主题和等级将等级列分为多个列

    grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",sep=";")

# Rename the column names

    names(grade)<-c("Student_ID","Name","Venue","Grade")

    head(grade)

    # Separate `Grade` into `subject` variables and coresponding `Grade`columns
    library(tidyverse)


    df<- grade %>% separate(Grade,paste("V",1:7,sep="_"),sep=":")

    head(df)

    # It still is not separating `subject ` and `grade` independently

    # Here is what I want it to look like

    new_df<-df[c(1:5),c(1:4)]

    new_df<-data.frame(new_df, V2=c(1:5)) # the same for V2,4,5,6,,7 to separate subject and grade

    new_df 

我正在尝试使用dplyr a nd stringr,但无法产生预期的结果

推荐答案

在下面的解决方案中,我使用了函数来自 tidyverse rebus 软件包。 rebus 包使用人类可读的代码逐段构建正则表达式。

In my solution below, I have used functions from the tidyverse and rebus packages. The rebus package builds regular expressions piece by piece using human readable code.

 library(tidyverse)
 library(rebus)
 grade<-read.csv("https://raw.githubusercontent.com/tuyenhavan/Statistics/Dataset/High_school_Grade.csv",
                 sep = ";", stringsAsFactors = FALSE)

 grade_new <- grade %>%
   mutate(DIEM_THI2 = str_replace_all(DIEM_THI, pattern = ":" %R% one_or_more(SPC), "-")) %>%
   separate_rows(DIEM_THI2, sep = one_or_more(SPC)) %>%
   separate(DIEM_THI2, c("SUBJECT", "GRADE"), sep = "-") %>%
   spread(SUBJECT,GRADE)

结果数据帧如下所示:

head(grade_new[,5:12])
#   Biology Chemitry English Geography History Literature Math Physics
# 1    6.00     6.00    <NA>      <NA>    <NA>       7.50 4.25    6.80
# 2    5.80     6.00    <NA>      <NA>    <NA>       6.00 5.75    <NA>
# 3    <NA>     <NA>    <NA>      8.00    4.50       7.75 2.25    <NA>
# 4    <NA>     <NA>    <NA>      7.25    7.50       7.75 3.25    <NA>
# 5    <NA>     <NA>    <NA>      7.75    4.50       8.25 1.75    <NA>
# 6    <NA>     6.60    6.78      <NA>    <NA>       7.00 8.75    8.40

代码可以理解如下:


  1. 所有冒号和空格子字符串均替换为连字符。即数学:4.25文学:7.50 变为数学4.25文学7.50 。这是通过 str_replace_all 函数完成的。让我们调用新变量 DIEM_THI2

  2. separate_rows 函数可拆分空间-分隔的列, DIEM_THI2 进入单独的行,即 Math-4.25 Literature- 7.50 跨两行。

  3. DIEM_THI2 列分为两列,即 SUBJECT GRADE ,其中前者包含类似 Math 的值,文学 ,后者包含类似 4.25 7.50

  4. 键值对或SUBJECT-GRADE对分布在多个列中。

  1. All colon+space substrings are replaced with hyphens. i.e. "Math: 4.25 Literature: 7.50" becomes "Math-4.25 Literature-7.50". This is done using the str_replace_all function. Lets call the new variable DIEM_THI2.
  2. The separate_rows function splits the space-separated column, DIEM_THI2 into separate rows i.e. "Math-4.25" and "Literature-7.50" span over two different rows.
  3. The DIEM_THI2 column is separated into two columns, i.e. SUBJECT and GRADE where the former contains values like "Math", "Literature" and the latter contains values like "4.25" and "7.50".
  4. The key-value pair or SUBJECT-GRADE pair are spread across multiple columns.

这篇关于如何将一列分隔为多列(复杂列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆