Tidyverse从其他数据框中收集行数据 [英] Tidyverse gather with rowdata from other data frame
问题描述
我一直在寻找相当长的时间来找到解决该问题的优雅方法,但无济于事。因此,我决定尝试一下。
I have been searching for quite some time to an elegant solution to this problem, to no avail. So I decided to give it a go here.
我正在使用 tidyverse
和 gather
函数将包含来自不同样本的强度值的矩阵转换为长格式,以准备使用ggplot进行绘图。
I am using tidyverse
, and the gather
function to convert a matrix containing intensity values from different samples into long format in preparation for plotting with ggplot.
有两种类型注解。与基因相对应的数据的基于行的注释,与样本信息相对应的基于列的注释。基于列的信息存储在单独的数据框中。
There are two types of annotation. 'Row-based' annotation of the data, corresponding to genes, and 'column-based' annotation corresponding to sample information. The column based information is stored in a separate dataframe.
使用 gather
可以轻松准备值和行
Using gather
it is easy to prepare the values and row-based annotations to long format.
> df <- data.frame(annot=c("A", "B", "C", "D"), sample1=c(1,1,4,2), sample2=c(3,5,4,5))
> df
annot sample1 sample2
1 A 1 3
2 B 1 5
3 C 4 4
4 D 2 5
> df %>% gather(sample, value, -annot)
annot sample value
1 A sample1 1
2 B sample1 1
3 C sample1 4
4 D sample1 2
5 A sample2 3
6 B sample2 5
7 C sample2 4
8 D sample2 5
样本信息比较棘手。它存储在一个单独的数据框中:
The sample-information is more tricky. It is stored in a separate data frame:
> sample_info <- data.frame(sample=c("sample1", "sample2"), condition=c("infected", "uninfected"))
> sample_info
sample condition
1 sample1 infected
2 sample2 uninfected
所需的最终结果将如下所示:
The desired end result would look like the following:
annot sample value condition
1 A sample1 1 infected
2 B sample1 1 infected
3 C sample1 4 infected
4 D sample1 2 infected
5 A sample2 3 uninfected
6 B sample2 5 uninfected
7 C sample2 4 uninfected
8 D sample2 5 uninfected
我能够通过对生成长数据帧后,我逐行将样本名称映射到条件的数据帧。我正在寻找一种更整洁的解决方案,最好使用tidyverse软件包。有谁知道实现此目标的优雅方法?
I am able to achieve this by post-processing of the data frame where I map sample-name to condition row by row after generating the long data frame. I am looking for a neater solution, ideally using the tidyverse package. Do anyone know an elegant way to achieve this?
推荐答案
* _ join
dplyr 中的函数(加载了 tidyverse
)非常适合解决涉及多个数据框的许多问题。
The *_join
functions from dplyr
(loaded with tidyverse
) are great for solving lots of problems involving more than one dataframe.
> df %>%
gather(sample, value, -annot) %>%
left_join(sample_info, by = 'sample')
annot sample value condition
1 A sample1 1 infected
2 B sample1 1 infected
3 C sample1 4 infected
4 D sample1 2 infected
5 A sample2 3 uninfected
6 B sample2 5 uninfected
7 C sample2 4 uninfected
8 D sample2 5 uninfected
这篇关于Tidyverse从其他数据框中收集行数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!