使用正则表达式拆分列中的值 [英] Splitting the values in column using regex
本文介绍了使用正则表达式拆分列中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的data.frame有两列,如下所示:
I have data.frame with two columns like the following
dat
ID Details
id_1 box1_homodomain gn=box1 os=homo sapiens p=4 se=1
id_2 sox2_plurinet gn=plu os=mus musculus p=5 se=3
我想在详细信息"列中为所有ID拆分"os = xxx"和gn ="yyy",并按如下所示进行打印:
I would like to split the "os=xxx" and gn="yyy" in column "Details" for all the ids and print it like following:
Id Description gn os
Îd_1 box1_homodomain box1 homo sapiens
Id_2 sox2_plurinet plu mouse musculus
我尝试在R中使用gsub方法,但是无法将os = homo sapiens和gn = box1分成各自的列.我使用的以下R代码
I tried the using gsub approach in R but I am unable to split the os=homo sapiens and gn=box1 into their respective columns. The following R code I used
dat$gn=gsub('^[gn=][A-z][A-z]`,dat$Details)
dat$os=gsub('^[os=][A-z][A-z]`,dat$Details)
任何人都可以告诉我什么地方出了问题以及如何纠正.请帮助我.
Can anyone tell me what wrong and how can it be corrected. Kindly help me.
预先感谢
推荐答案
tidyr的一个选项:
Here's an option with tidyr:
library(tidyr)
# specify the new column names:
vars <- c("Description", "gn", "os")
# then separate the "Details" column according to regex and drop extra columns:
separate(dat, Details, into = vars, sep = "[A-Za-z]+=", extra = "drop")
# ID Description gn os
#1 id_1 box1_homodomain box1 homo sapiens
#2 id_2 sox2_plurinet plu mus musculus
这篇关于使用正则表达式拆分列中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文