从 GRanges 文件中基因 ID 的 R 对象子集 [英] subsetting from an R object of Gene IDs from GRanges file

查看:26
本文介绍了从 GRanges 文件中基因 ID 的 R 对象子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为P.obj"的 GRanges 文件,我想在其中提取/子集名称"列中包含的特定基因 ID.我要提取的特定基因 ID 包含在 R 对象plus"中,其中列名也称为name"我了解如何按重叠进行子集并找到重叠,但我无法弄清楚如何按基因名称进行子集.

I have a GRanges file called "P.obj" where I want to extract/subset specific Gene IDs contained in the column "name". The specific Gene IDs that I want to extract are contained in the R object "plus" where the column name is also called "name" I understand how to subset by overlaps and find overlaps, but I cannot work out how to subset by gene name.

> P.obj
GRangesList of length 4:
$exons
GRanges with 604591 ranges and 2 metadata columns:
           seqnames               ranges strand   |     score            name
              <Rle>            <IRanges>  <Rle>   | <integer>     <character>
       [1]     chr1 [66999066, 66999090]      +   |         1 ENST00000237247
       [2]     chr1 [66999929, 67000051]      +   |         2 ENST00000237247
       [3]     chr1 [67091530, 67091593]      +   |         3 ENST00000237247
       [4]     chr1 [67098753, 67098777]      +   |         4 ENST00000237247
       [5]     chr1 [67099763, 67099846]      +   |         5 ENST00000237247
       ...      ...                  ...    ... ...       ...             ...
  [604587]    chr22 [51227323, 51227600]      +   |         4 ENST00000423888
  [604588]    chr22 [51222290, 51222500]      +   |         1 ENST00000480246
  [604589]    chr22 [51223601, 51223721]      +   |         2 ENST00000480246
  [604590]    chr22 [51237083, 51239737]      +   |         3 ENST00000480246
  [604591]    chr22 [51237083, 51237551]      +   |         1 ENST00000427528

...
<3 more elements>
---
seqlengths:
  chr1  chr2  chr3  chr4  chr5  chr6 ... chr17 chr18 chr19 chr20 chr21 chr22
    NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

> plus
             name
1 ENST00000237247
3 ENST00000480246
5 ENST00000427528

我试过:P.obj[P.obj$name==plus$name]

I have tried: P.obj[P.obj$name==plus$name]

但我收到一条错误消息:警告信息:在 is.na(e1) 中: is.na() 应用于NULL"类型的非(列表或向量)

But I get an error message: Warning message: In is.na(e1) : is.na() applied to non-(list or vector) of type 'NULL'

推荐答案

您想要的信息位于 GRanges 'metadata' 列中,可通过 mcols() 访问或 $.此外,您正在寻找集合成员 %in%,而不是身份.所以

The information you want is in the GRanges 'metadata' column, accessible with either mcols() or $. Also, you're looking for set membership %in%, rather than identity. So

P.obj[P.obj$name %in% plus$name]

考虑在 Bioconductor 支持网站上询问有关 Bioconductor 包的问题.

Consider asking questions about Bioconductor packages on the Bioconductor support site.

这篇关于从 GRanges 文件中基因 ID 的 R 对象子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆