从 GRanges 文件中基因 ID 的 R 对象子集 [英] subsetting from an R object of Gene IDs from GRanges file
问题描述
我有一个名为P.obj"的 GRanges 文件,我想在其中提取/子集名称"列中包含的特定基因 ID.我要提取的特定基因 ID 包含在 R 对象plus"中,其中列名也称为name"我了解如何按重叠进行子集并找到重叠,但我无法弄清楚如何按基因名称进行子集.
I have a GRanges file called "P.obj" where I want to extract/subset specific Gene IDs contained in the column "name". The specific Gene IDs that I want to extract are contained in the R object "plus" where the column name is also called "name" I understand how to subset by overlaps and find overlaps, but I cannot work out how to subset by gene name.
> P.obj
GRangesList of length 4:
$exons
GRanges with 604591 ranges and 2 metadata columns:
seqnames ranges strand | score name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] chr1 [66999066, 66999090] + | 1 ENST00000237247
[2] chr1 [66999929, 67000051] + | 2 ENST00000237247
[3] chr1 [67091530, 67091593] + | 3 ENST00000237247
[4] chr1 [67098753, 67098777] + | 4 ENST00000237247
[5] chr1 [67099763, 67099846] + | 5 ENST00000237247
... ... ... ... ... ... ...
[604587] chr22 [51227323, 51227600] + | 4 ENST00000423888
[604588] chr22 [51222290, 51222500] + | 1 ENST00000480246
[604589] chr22 [51223601, 51223721] + | 2 ENST00000480246
[604590] chr22 [51237083, 51239737] + | 3 ENST00000480246
[604591] chr22 [51237083, 51237551] + | 1 ENST00000427528
...
<3 more elements>
---
seqlengths:
chr1 chr2 chr3 chr4 chr5 chr6 ... chr17 chr18 chr19 chr20 chr21 chr22
NA NA NA NA NA NA ... NA NA NA NA NA NA
> plus
name
1 ENST00000237247
3 ENST00000480246
5 ENST00000427528
我试过:P.obj[P.obj$name==plus$name]
I have tried: P.obj[P.obj$name==plus$name]
但我收到一条错误消息:警告信息:在 is.na(e1) 中: is.na() 应用于NULL"类型的非(列表或向量)
But I get an error message: Warning message: In is.na(e1) : is.na() applied to non-(list or vector) of type 'NULL'
推荐答案
您想要的信息位于 GRanges
'metadata' 列中,可通过 mcols()
访问或 $
.此外,您正在寻找集合成员 %in%
,而不是身份.所以
The information you want is in the GRanges
'metadata' column, accessible with either mcols()
or $
. Also, you're looking for set membership %in%
, rather than identity. So
P.obj[P.obj$name %in% plus$name]
考虑在 Bioconductor 支持网站上询问有关 Bioconductor 包的问题.
Consider asking questions about Bioconductor packages on the Bioconductor support site.
这篇关于从 GRanges 文件中基因 ID 的 R 对象子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!