使用正则表达式拆分列中的值 [英] Splitting the values in column using regex

查看:82
本文介绍了使用正则表达式拆分列中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的data.frame有两列,如下所示:

I have data.frame with two columns like the following

dat

    ID                             Details                         
    id_1        box1_homodomain gn=box1 os=homo sapiens p=4 se=1   
    id_2        sox2_plurinet gn=plu os=mus musculus p=5 se=3 

我想在详细信息"列中为所有ID拆分"os = xxx"和gn ="yyy",并按如下所示进行打印:

I would like to split the "os=xxx" and gn="yyy" in column "Details" for all the ids and print it like following:

    Id   Description        gn      os               
   Îd_1  box1_homodomain    box1    homo sapiens   
   Id_2  sox2_plurinet      plu     mouse musculus 

我尝试在R中使用gsub方法,但是无法将os = homo sapiens和gn = box1分成各自的列.我使用的以下R代码

I tried the using gsub approach in R but I am unable to split the os=homo sapiens and gn=box1 into their respective columns. The following R code I used

dat$gn=gsub('^[gn=][A-z][A-z]`,dat$Details)
dat$os=gsub('^[os=][A-z][A-z]`,dat$Details)

任何人都可以告诉我什么地方出了问题以及如何纠正.请帮助我.

Can anyone tell me what wrong and how can it be corrected. Kindly help me.

预先感谢

推荐答案

tidyr的一个选项:

Here's an option with tidyr:

library(tidyr)
# specify the new column names:
vars <- c("Description", "gn", "os")
# then separate the "Details" column according to regex and drop extra columns:
separate(dat, Details, into = vars, sep = "[A-Za-z]+=", extra = "drop")
#    ID      Description    gn            os
#1 id_1 box1_homodomain  box1  homo sapiens 
#2 id_2   sox2_plurinet   plu  mus musculus

这篇关于使用正则表达式拆分列中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆