如何搜索和R中隔离FASTA格式的文本属性 [英] How to search and isolate attributes of FASTA formatted text in R

查看:241
本文介绍了如何搜索和R中隔离FASTA格式的文本属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个FASTA格式的文件,这本质上是一种特殊的文本文件,其中包含许多条目,其中一个下面的样子,这是我的名字FASTA,在R.原始文件是红色和格式化的分配下面看到的R中使用seqinr包。

I have a FASTA formatted file, which is essentially a special text file, containing many entries, one of which looks like below, which I have assigned by the name "FASTA" in R. The original file was red and formated as seen below using seqinr package in R.

FASTA<- structure(list(`tr|A1Z6G9|A1Z6G9_DROME` = structure("MSISASHPCGLNADGTATQYKESTATIQTSGLQSSPRSFLPEREDTLEYFIKFPKPSSKNEFVLAKDHDGEDSHVPIVMLLGWAGCQDRYLMKYSKIYEERGLITVRYTAPVDSLFWKRSEMIPIGEKILKLIQDMNFDAHPLIFHIFSNGGAYLYQHINLAVIKHKSPLQVRGVIFDSAPGERRIISLYRAITAIYGREKRCNCLAALVITITLSIMWFVEESISALKSLFVPSSPVRPSPFCDLKNEANRYPQLFLYSKGDIVIPYRDVEKFIRLRRDQGIQVSSVCFEDAEHVKIYTKYPKQYVQCVCNFIRNCMTIPPLKEAVNSEPSESVSRVNLKYD", name = "tr|A1Z6G9|A1Z6G9_DROME", Annot = ">tr|A1Z6G9|A1Z6G9_DROME CG8245 OS=Drosophila melanogaster GN=CG8245-RA PE=2 SV=1", class = "SeqFastaAA")))

现在虽然这种格式可以让我获得进入/条目的名称索引,当我搜索使用grep它,因为看到如下

Now although this format allows me to get the name indices of the entry/entries, when I search for it using grep, as seen below

grep("A1Z6G9_DROME", names(FASTA))

或使用隔离它的名字

as.vector(sapply(names(attributes(FASTA)), function(x) attr(FASTA, x)))

不过,我猜不出的grep / regexpr任何在属性部分文字/信息或隔离任何属性,如下面的名称=或阿诺=部分文字。谁能帮我这个?

However I can not either grep/regexpr any of the text/information in the attributes sections or isolate any of the attributes, such as the text following name= or Annot= section. Can anyone help me with this?

据我可以收集,R中谷歌搜索read.fasta时,手动与该seqinr包国家一道注释东西线/属性被忽略(我认为),但这些属性节举办一些关于身份的重要信息该条目,我迫切需要的!我已经尝试了不公开或与粘贴功能崩溃,但他们删除,我需要的所有属性!

As far as I could gather, when googling read.fasta in R, the manual relating to the seqinr package states something along the lines of annotations/attributes being ignored (I think) but these attribute sections hold important information regarding the identity of the entry, which I desperately need! I have tried the unlist or collapse with the paste function but they remove all the attributes that I need!

推荐答案

有在 seqinr 的包很多获得* 函数(看到 http://www.rdocumentation.org/packages/seqinr )。这些功能被设计来访问不同的属性,例如:

There are a lot of get* functions in the seqinr package (see http://www.rdocumentation.org/packages/seqinr). These functions are designed to access different attributes, e.g.:

getAnnot(FASTA)
#[[1]]
#[1] ">tr|A1Z6G9|A1Z6G9_DROME CG8245 OS=Drosophila melanogaster GN=CG8245-RA PE=2 SV=1"

getSequence(FASTA)
#[[1]]
#  [1] "M" "S" "I" "S" "A" "S" "H" "P" "C" "G" "L" "N" "A" "D" "G" "T" "A" "T" "Q" "Y" "K" "E" "S" "T" "A" "T" "I" "Q" "T" "S" "G" "L" "Q" "S" "S" "P" "R" "S" "F" "L" "P" "E" "R" "E" "D" "T" "L" "E" "Y" "F" "I" "K" "F" "P" "K" "P" "S" "S" "K"
# [60] "N" "E" "F" "V" "L" "A" "K" "D" "H" "D" "G" "E" "D" "S" "H" "V" "P" "I" "V" "M" "L" "L" "G" "W" "A" "G" "C" "Q" "D" "R" "Y" "L" "M" "K" "Y" "S" "K" "I" "Y" "E" "E" "R" "G" "L" "I" "T" "V" "R" "Y" "T" "A" "P" "V" "D" "S" "L" "F" "W" "K"
#[119] "R" "S" "E" "M" "I" "P" "I" "G" "E" "K" "I" "L" "K" "L" "I" "Q" "D" "M" "N" "F" "D" "A" "H" "P" "L" "I" "F" "H" "I" "F" "S" "N" "G" "G" "A" "Y" "L" "Y" "Q" "H" "I" "N" "L" "A" "V" "I" "K" "H" "K" "S" "P" "L" "Q" "V" "R" "G" "V" "I" "F"
#[178] "D" "S" "A" "P" "G" "E" "R" "R" "I" "I" "S" "L" "Y" "R" "A" "I" "T" "A" "I" "Y" "G" "R" "E" "K" "R" "C" "N" "C" "L" "A" "A" "L" "V" "I" "T" "I" "T" "L" "S" "I" "M" "W" "F" "V" "E" "E" "S" "I" "S" "A" "L" "K" "S" "L" "F" "V" "P" "S" "S"
#[237] "P" "V" "R" "P" "S" "P" "F" "C" "D" "L" "K" "N" "E" "A" "N" "R" "Y" "P" "Q" "L" "F" "L" "Y" "S" "K" "G" "D" "I" "V" "I" "P" "Y" "R" "D" "V" "E" "K" "F" "I" "R" "L" "R" "R" "D" "Q" "G" "I" "Q" "V" "S" "S" "V" "C" "F" "E" "D" "A" "E" "H"
#[296] "V" "K" "I" "Y" "T" "K" "Y" "P" "K" "Q" "Y" "V" "Q" "C" "V" "C" "N" "F" "I" "R" "N" "C" "M" "T" "I" "P" "P" "L" "K" "E" "A" "V" "N" "S" "E" "P" "S" "E" "S" "V" "S" "R" "V" "N" "L" "K" "Y" "D"

这篇关于如何搜索和R中隔离FASTA格式的文本属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆