用rpy2修改r对象 [英] Modify r object with rpy2

查看:115
本文介绍了用rpy2修改r对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用rpy2在python中使用DESeq2 R/Bioconductor软件包.

I'm trying to use rpy2 to use the DESeq2 R/Bioconductor package in python.

我在写问题时实际上解决了我的问题(使用do_slots可以访问r对象的属性),但是我认为该示例可能对其他人有用,所以这是我在R中的工作方式以及它的翻译方式python:

I actually solved my problem while writing my question (using do_slots allows access to the r objects attributes), but I think the example might be useful for others, so here is how I do in R and how this translates in python:

我可以从两个数据帧中创建一个"DESeqDataSet",如下所示:

I can create a "DESeqDataSet" from two data frames as follows:

counts_data <- read.table("long/path/to/file",
                           header=TRUE, row.names="gene")
head(counts_data)
##       WT_RT_1 WT_RT_2 prg1_RT_1 prg1_RT_2
## aap-1     406     311        41        95
## aat-1       5       8         2         0
## aat-2       1       1         0         0
## aat-3      13      12         0         1
## aat-4       6       6         2         3
## aat-5       3       1         1         0

col_data <- DataFrame(lib = c("WT", "WT", "prg1", "prg1"),
                      treat = c("RT", "RT", "RT", "RT"),
                      rep = c("1", "2", "1", "2"), 
                      row.names = colnames(counts_data))
head(col_data)
## DataFrame with 4 rows and 3 columns
##                   lib       treat         rep
##           <character> <character> <character>
## WT_RT_1            WT          RT           1
## WT_RT_2            WT          RT           2
## prg1_RT_1        prg1          RT           1
## prg1_RT_2        prg1          RT           2
dds <- DESeqDataSetFromMatrix(countData = counts_data,
                              colData = col_data,
                              design = ~ lib)
## Warning message:
## In DESeqDataSet(se, design = design, ignoreRank) :
## some variables in design formula are characters, converting to factors

dds
## class: DESeqDataSet 
## dim: 18541 4 
## metadata(1): version
## assays(1): counts
## rownames(18541): aap-1 aat-1 ... WBGene00255550 WBGene00255553
## rowData names(0):
## colnames(4): WT_RT_1 WT_RT_2 prg1_RT_1 prg1_RT_2
## colData names(3): lib treat rep

为了确保分析将使用正确的控件,我需要relevel一个可以使用双括号"语法访问的因子:

To ensure the analysis will use the correct control, I need to relevel a factor that can be accessed using the "double brackets" syntax:

dds[["lib"]]
## [1] WT   WT   prg1 prg1
## Levels: prg1 WT

dds[["lib"]] <- relevel(dds[["lib"]], ref="WT")
dds[["lib"]]
## [1] WT   WT   prg1 prg1
## Levels: WT prg1

然后我可以运行分析:

dds <- DESeq(dds)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing

res <- results(dds)

我查看给定基因的结果:

I look at the results for a given gene:

res["his-10",]
## log2 fold change (MAP): lib prg1 vs WT 
## Wald test p-value: lib prg1 vs WT 
## DataFrame with 1 row and 6 columns
##         baseMean log2FoldChange     lfcSE      stat       pvalue        padj
##        <numeric>      <numeric> <numeric> <numeric>    <numeric>   <numeric>
## his-10  586.5464       3.136174 0.2956132  10.60904 2.705026e-26 8.78785e-25

在Python中

现在,我想在Python中使用rpy2进行同样的操作.

In Python

Now, I would like to do the same in python with rpy2.

我似乎已经从pandas数据框中成功创建了对象:

I seem to successfully create the object from pandas dataframes:

import pandas as pd
from rpy2.robjects import r, pandas2ri, Formula
as_df = r("as.data.frame")
from rpy2.robjects.packages import importr
deseq2 = importr("DESeq2")

counts_data = pd.read_table("long/path/to/file", index_col=0)
col_data = pd.DataFrame({
    "cond_names" : counts_data.columns,
    "lib" : ["WT", "WT", "prg1", "prg1"],
    "rep" : ["1", "1", "2", "2"],
    "treat" : ["RT", "RT", "RT", "RT"]})
col_data.set_index("cond_names", inplace=True)

pandas2ri.activate()  # makes some conversions automatic
dds = deseq2.DESeqDataSetFromMatrix(
    countData=counts_data,
    colData=col_data,
    design=Formula("~lib"))

在IPython(我实际上在其中运行了先前的命令)中,我可以使用do_slots查找对象内部,以尝试确定需要重新调平的因素:

In IPython (where I actually ran the previous commands), I can look inside the object using do_slots to try to identify the factor which needs relevelling:

In [229]: tuple(dds.do_slot("colData").slotnames())
Out[229]: ('rownames', 'nrows', 'listData', 'elementType', 'elementMetadata', 'metadata')

In [230]: dds.do_slot("colData").do_slot("listData")
Out[230]: 
R object with classes: ('list',) mapped to:
<ListVector - Python:0x7f2ae2590a08 / R:0x108fcdd0>
[FactorVector, FactorVector, FactorVector]
  lib: <class 'rpy2.robjects.vectors.FactorVector'>
  R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2ae20f1c08 / R:0x136a3920>
[       2,        2,        1,        1]
  rep: <class 'rpy2.robjects.vectors.FactorVector'>
  R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2a9600c948 / R:0x136a30f0>
[       1,        1,        2,        2]
  treat: <class 'rpy2.robjects.vectors.FactorVector'>
  R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2a9600ccc8 / R:0x136a3588>
[       1,        1,        1,        1]

我认为relevel的因素是第一个因素,因为"lib"是传递给deseq2.DESeqDataSetFromMatrix函数的col_data数据帧中的第一列(我意识到,"lib"实际上是写在r对象的说明).

I suppose the factor to relevel is the first one because "lib" was the first column in the col_data dataframe passed to the deseq2.DESeqDataSetFromMatrix function ( I realize that "lib" is actually written in the description of the r object).

通过do_slots访问的属性上的relevel似乎有效果:

The relevel on attributes accessed via do_slots seems to have effects:

In [231]: dds.do_slot("colData").do_slot("listData")[0] = r.relevel(dds.do_slot("colData").do_slot("listData")[0], ref="WT")

In [232]: dds.do_slot("colData").do_slot("listData")
Out[232]: 
R object with classes: ('list',) mapped to:
<ListVector - Python:0x7f2a95078508 / R:0x108fcdd0>
[FactorVector, FactorVector, FactorVector]
  lib: <class 'rpy2.robjects.vectors.FactorVector'>
  R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2a9600bb88 / R:0x12a7ff60>
[       1,        1,        2,        2]
  rep: <class 'rpy2.robjects.vectors.FactorVector'>
  R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2ae2568888 / R:0x136a30f0>
[       1,        1,        2,        2]
  treat: <class 'rpy2.robjects.vectors.FactorVector'>
  R object with classes: ('factor',) mapped to:
<FactorVector - Python:0x7f2ae2568848 / R:0x136a3588>
[       1,        1,        1,        1]

然后我运行分析部分:

In [233]: dds = deseq2.DESeq(dds)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: estimating size factors

  warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: estimating dispersions

  warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: gene-wise dispersion estimates

  warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: mean-dispersion relationship

  warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: final dispersion estimates

  warnings.warn(x, RRuntimeWarning)
/home/bli/.local/lib/python3.6/site-packages/rpy2/rinterface/__init__.py:186: RRuntimeWarning: fitting model and testing

  warnings.warn(x, RRuntimeWarning)

In [234]: res = pandas2ri.ri2py(as_df(deseq2.results(dds)))

In [235]: res.index.names = ["gene"]

dds = deseq2.DESeq(dds)
res = pandas2ri.ri2py(as_df(deseq2.results(dds)))
res.index.names = ["gene"]

现在,检查测试基因的结果:

Now, check the results for a test gene:

In [236]: res.loc["his-10"]
Out[236]: 
baseMean          5.865464e+02
log2FoldChange    3.136174e+00
lfcSE             2.956132e-01
stat              1.060904e+01
pvalue            2.705026e-26
padj              8.787850e-25
Name: his-10, dtype: float64

python返回的结果与R中的结果相同.

The results returned by python are the same as from R.

推荐答案

我在rpy2文档中找到了帮助我解决问题的代码示例:

I found code examples in the rpy2 documentation that helped me solve the problem: http://rpy2.readthedocs.io/en/version_2.8.x/rinterface.html#pass-by-value-paradigm.

一个人可以通过do_slots方法访问r个对象的属性,该方法将属性名称作为参数.有关完整的解决方案,请参见问题.

One can access attributes of r objects via the do_slots method, which takes as argument the attribute name. See in the question for the full solution.

还有一个do_slot_assign方法,例如,可以用来更改设计公式:

There also is a do_slot_assign method that can be used for instance to change the design formula:

>>> dds.do_slot("design").r_repr()
'~lib'
>>> dds.do_slot_assign("design", Formula("~ treat"))
>>> dds.do_slot("design").r_repr()
'~treat'

这篇关于用rpy2修改r对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆