读取带有以"##"开头的注释行的表格 [英] Read table with comment lines starting with "##"
问题描述
我正在努力阅读带有R的变体调用格式(VCF)的表.
每个文件都有一些以##
开头的注释行,然后是以#
开头的标题.
I'm struggling to read my tables in Variant Call Format (VCF) with R.
Each file has some comment lines starting with ##
, and then the header starting with #
.
## contig=<ID=OTU1431,length=253>
## contig=<ID=OTU915,length=253>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT /home/sega/data/bwa/reads/0015.2142.fastq.q10sorted.bam
Eubacterium_ruminantium_AB008552 56 . C T 228 . DP=212;AD=0,212;VDB=0;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=0,0,0,212;MQ=59 GT:PL 1/1:255,255,0
如何在不丢失标题的情况下读取此类表?
将read.table()
与comment.char = "##"
一起使用将返回错误:"invalid 'comment.char' argument"
How can I read such table without missing a header?
Using read.table()
with comment.char = "##"
returns an error: "invalid 'comment.char' argument"
推荐答案
如果您想阅读VCF,也可以尝试使用Bioconductor中VariantAnnotation
中的readVcf
.
https://bioconductor.org/packages/release/bioc/html/VariantAnnotation. html
If you want to read VCF, you can also just try to use readVcf
from VariantAnnotation
in Bioconductor.
https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html
否则,我强烈建议使用data.table
软件包中的fread
函数.
它允许您使用skip
参数允许它在找到子字符串时开始导入.
Otherwise, I can highly recommend fread
function in data.table
package.
It allows you to use the skip
argument to allow it to start importing when a substring has been found.
例如
fread("test.vcf", skip = "CHROM")
应该工作.
这篇关于读取带有以"##"开头的注释行的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!