读取带有以"##"开头的注释行的表格 [英] Read table with comment lines starting with "##"

查看:178
本文介绍了读取带有以"##"开头的注释行的表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力阅读带有R的变体调用格式(VCF)的表. 每个文件都有一些以##开头的注释行,然后是以#开头的标题.

I'm struggling to read my tables in Variant Call Format (VCF) with R. Each file has some comment lines starting with ##, and then the header starting with #.

## contig=<ID=OTU1431,length=253>
## contig=<ID=OTU915,length=253>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  /home/sega/data/bwa/reads/0015.2142.fastq.q10sorted.bam
Eubacterium_ruminantium_AB008552    56  .   C   T   228 .   DP=212;AD=0,212;VDB=0;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=0,0,0,212;MQ=59    GT:PL   1/1:255,255,0

如何在不丢失标题的情况下读取此类表? 将read.table()comment.char = "##"一起使用将返回错误:"invalid 'comment.char' argument"

How can I read such table without missing a header? Using read.table() with comment.char = "##" returns an error: "invalid 'comment.char' argument"

推荐答案

如果您想阅读VCF,也可以尝试使用Bioconductor中VariantAnnotation中的readVcf. https://bioconductor.org/packages/release/bioc/html/VariantAnnotation. html

If you want to read VCF, you can also just try to use readVcf from VariantAnnotation in Bioconductor. https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html

否则,我强烈建议使用data.table软件包中的fread函数. 它允许您使用skip参数允许它在找到子字符串时开始导入.

Otherwise, I can highly recommend fread function in data.table package. It allows you to use the skip argument to allow it to start importing when a substring has been found.

例如

fread("test.vcf", skip = "CHROM")

应该工作.

这篇关于读取带有以"##"开头的注释行的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆