如何通过提取特定行来制作变量? [英] How to make a variable by extracting specific line?

查看:62
本文介绍了如何通过提取特定行来制作变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下类似的数据,其中基因名称(例如ABCB9)中包含SNP名称(rs号或c#_pos#).在名为c#_pos000000的SNP中,#的范围是1到22(染色体数)

I have data like below with SNP names (rs number or c#_pos#) included in gene names (e.g. ABCB9). In SNPs named as c#_pos000000, range of # is 1 to 22 (chromosome number)

ABCB9  
rs11057374  
rs7138100  
c22_pos41422393  
rs12309481  
END  

ABCC10  
rs1214748  
END  

HDAC9  
rs928578  
rs10883039  
END  

HCN2
rs12428035  
rs9561933  
c2_pos102345
rs3848077  
rs3099362    
END 

通过使用这些数据,我要使输出如下所示

by using this data, I want to make the output like below

rs11057374        ABCB9  
rs7138100         ABCB9  
c22_pos41422393   ABCB9  
rs12309481        ABCB9  

rs1214748         ABCC10   

rs928578          HDAC9    
rs10883039        HDAC9    

rs12428035        HCN2     
rs9561933         HCN2      
c2_pos102345      HCN2      
rs3848077         HCN2      
rs3099362         HCN2  

没有必要空白和"END"

It is not necessary whether there are blank and "END"

如何在R或linux中生成此输出?

How make the this output in R or linux?

推荐答案

我们可以略有不同.使用readLines读取文件并删除前/后空格(trimws),split基于基于空白值("")创建的分组向量的'lines1'之后,请删除""或从list元素中的"END"字符串,然后通过对每个list元素(sapply(lst1, [, 1))的第一次观察来设置listnames,同时提取除第一个元素以外的所有其他元素并stack它.

We can do this slightly differently. After reading the file with readLines and removing the leading/lagging spaces (trimws), split the 'lines1' based on the grouping vector creating based on blank values (""), remove the "" or "END" strings from the list elements, then set the names of the list with the first observation of each list element (sapply(lst1, [, 1)) while extracting all other elements except the first one and stack it.

lines1 <- trimws(lines)
lst1 <- lapply(split(lines1, cumsum(lines1=="")), 
                function(x) x[!x %in% c("", "END")])

stack(setNames(lapply(lst1,`[`,-1), sapply(lst1, `[`,1)))
#            values    ind
#1       rs11057374  ABCB9
#2        rs7138100  ABCB9
#3  c22_pos41422393  ABCB9
#4       rs12309481  ABCB9
#5        rs1214748 ABCC10
#6         rs928578  HDAC9
#7       rs10883039  HDAC9
#8       rs12428035   HCN2
#9        rs9561933   HCN2
#10    c2_pos102345   HCN2
#11       rs3848077   HCN2
#12       rs3099362   HCN2

数据

lines <- readLines("yourdata.txt") 

这篇关于如何通过提取特定行来制作变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆