Tidyverse从其他数据框中收集行数据 [英] Tidyverse gather with rowdata from other data frame

查看:94
本文介绍了Tidyverse从其他数据框中收集行数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找相当长的时间来找到解决该问题的优雅方法,但无济于事。因此,我决定尝试一下。

I have been searching for quite some time to an elegant solution to this problem, to no avail. So I decided to give it a go here.

我正在使用 tidyverse gather 函数将包含来自不同样本的强度值的矩阵转换为长格式,以准备使用ggplot进行绘图。

I am using tidyverse, and the gather function to convert a matrix containing intensity values from different samples into long format in preparation for plotting with ggplot.

有两种类型注解。与基因相对应的数据的基于行的注释,与样本信息相对应的基于列的注释。基于列的信息存储在单独的数据框中。

There are two types of annotation. 'Row-based' annotation of the data, corresponding to genes, and 'column-based' annotation corresponding to sample information. The column based information is stored in a separate dataframe.

使用 gather 可以轻松准备值和行

Using gather it is easy to prepare the values and row-based annotations to long format.

> df <- data.frame(annot=c("A", "B", "C", "D"), sample1=c(1,1,4,2), sample2=c(3,5,4,5))
> df
  annot sample1 sample2
1     A       1       3
2     B       1       5
3     C       4       4
4     D       2       5
> df %>% gather(sample, value, -annot)
  annot  sample value
1     A sample1     1
2     B sample1     1
3     C sample1     4
4     D sample1     2
5     A sample2     3
6     B sample2     5
7     C sample2     4
8     D sample2     5

样本信息比较棘手。它存储在一个单独的数据框中:

The sample-information is more tricky. It is stored in a separate data frame:

> sample_info <- data.frame(sample=c("sample1", "sample2"), condition=c("infected", "uninfected"))
> sample_info
   sample  condition
1 sample1   infected
2 sample2 uninfected

所需的最终结果将如下所示:

The desired end result would look like the following:

  annot  sample value condition
1     A sample1     1 infected
2     B sample1     1 infected
3     C sample1     4 infected
4     D sample1     2 infected
5     A sample2     3 uninfected
6     B sample2     5 uninfected
7     C sample2     4 uninfected
8     D sample2     5 uninfected

我能够通过对生成长数据帧后,我逐行将样本名称映射到条件的数据帧。我正在寻找一种更整洁的解决方案,最好使用tidyverse软件包。有谁知道实现此目标的优雅方法?

I am able to achieve this by post-processing of the data frame where I map sample-name to condition row by row after generating the long data frame. I am looking for a neater solution, ideally using the tidyverse package. Do anyone know an elegant way to achieve this?

推荐答案

* _ join dplyr 中的函数(加载了 tidyverse )非常适合解决涉及多个数据框的许多问题。

The *_join functions from dplyr (loaded with tidyverse) are great for solving lots of problems involving more than one dataframe.

> df %>%
      gather(sample, value, -annot) %>%
      left_join(sample_info, by = 'sample')

  annot  sample value  condition
1     A sample1     1   infected
2     B sample1     1   infected
3     C sample1     4   infected
4     D sample1     2   infected
5     A sample2     3 uninfected
6     B sample2     5 uninfected
7     C sample2     4 uninfected
8     D sample2     5 uninfected

这篇关于Tidyverse从其他数据框中收集行数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆