操作每个实验有多个列的数据框 [英] Manipulate a data frame where there are multiple colums for each experiment

查看：98 发布时间：2017/3/25 23:39:36 r dataframe

本文介绍了操作每个实验有多个列的数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有许多测序实验，每个具有多个结果，每个数百个基因，当数据从另一个程序输出，它不是一个有用的格式为我，因为所有的实验和每个结果列在顶部每个基因有一行。我已经写了一个示例数据集，我如何解决这个问题作为一个例子，但我想要一个更好的方法，因为我的数据集非常大。

col1 < - c（，，gene1，gene2，gene3，gene4）
col2 <-C（Experiment1 1，a，b，c，d）
col3 <-C（Experiment1，Part 2，e，f，g h）
col4 < - c（Experiment2，Part 1，i，j，k，l）
col5< - c（Experiment2 part 2，m，n，o，p）
pp< - data.frame（col1，col2，col3，col4，col5）
one& 。框架（pp $ col1，pp $ col2）
onetwo< - data.frame（pp $ col1，pp $ col3）
two< -data.frame（pp $ col1，pp $ col4）
twotwo< -data.frame（pp $ col1，pp $ col5）

one $ V3 [3：6]< -as.character（one [2,2]）
one< -one [-2，]
one< --one [-1，]
colnames（1）< - c（gene，Experiment 1，part

onetwo $ V3 [3：6]< -as.character（onetwo [2,2]）
onetwo< -onetwo [-2，]
onetwo< onetwo [-1，]
colnames（onetwo）< - c（gene，Experiment 1，part）

x1 <-rbind（one，onetwo）

two $ V3 [3：6]< -as.character（two [2,2]）
two< -two [-2，]
two< -two [-1，]
colnames （2）< - c（基因，实验2，部分）

twotwo $ V3 [3：6]< -as.character（twotwo [ 2,2]）
twotwo <-twotwo [-2，]
twotwo <-twotwo [-1，]
colnames（twotwo）< - c（基因，实验2，part）

x2 <-rbind（two，twotwo）

x3 <-merge（x1，x2）

对于大量的代码，我深表歉意，但我无法具体说明此操作。 pp是示例数据帧，x3是我需要的格式。有没有更好的方法来实现？

解决方案

这可能是一个较短的方法：

  pp.new<  -  as.data.frame（t（pp）[ -  1，]，row.names = 1）
名称（pp.new）<  -  c（实验，部分，基因1，基因2，基因3，基因4）

其中：

 > pp.new 
实验部分gene1 gene2 gene3 gene4 
 1实验1第1部分abcd 
 2实验1第2部分efgh 
 3实验2第1部分ijkl 
 4实验2第2部分mnop

然而，使用 reshape2 将其转换为长格式可能更好package：

  library（reshape2）
 pp.long<  -  melt（pp.new，id = c 实验，部分））

导致：

 > pp.long 
实验部分变量值
 1实验1第1部分基因1 a 
 2实验1第2部分基因1 e 
 3实验2第1部分gene1 i 
 4实验2第2部分基因1 m 
 5实验1第1部分基因2 b 
 6实验1第2部分基因2 f 
 7实验2第1部分基因2 j 
 8实验2第2部分基因2 n 
 9实验1第1部分基因3 c 
 10实验1第2部分基因3 g 
 11实验2第1部分gene3 k 
 12实验2第2部分基因3 o 
 13实验1第1部分gene4 d 
 14实验1第2部分gene4 h 
 15 Experiment2 Part 1 gene4 l 
 16 Experiment2 Part 2 gene4 p

如果要在 x3 中获得可比较的输出，可以使用 recast 函数（也可以从 reshape2 包）：

  recast（pp.new，part + variable〜experiment，id.var = c （实验 ，part），value.var =value）

其中给出：

 部分变量实验1实验2 
 1第1部分gene1 ai 
 2第1部分gene2 bj 
 3第1部分gene3 ck 
 4 Part 1 gene4 dl 
 5 Part 2 gene1 em 
 6 Part 2 gene2 fn 
 7 Part 2 gene3 go 
 8 Part 2 gene4 hp

I have many sequencing experiments each with multiple results for each of a few hundred genes, when the data is outputted from another programme it isn't in a useful format for me as all the Experiments and each result are listed along the top and there is one row for each gene. I have written an example data set and how I am currently solving this problem as an example but I would like a more optimal method as my data sets are very large.

 col1<- c("","", "gene1", "gene2", "gene3", "gene4")
 col2<- c("Experiment1", "Part 1", "a","b","c","d")
 col3<- c("Experiment1", "Part 2", "e", "f", "g", "h")
 col4<- c("Experiment2", "Part 1", "i", "j", "k", "l")
 col5<- c("Experiment2", "Part 2", "m", "n", "o", "p")
 pp<- data.frame(col1,col2,col3,col4,col5)
 one<-data.frame(pp$col1, pp$col2)
 onetwo<- data.frame(pp$col1,pp$col3)
 two<-data.frame(pp$col1, pp$col4)
 twotwo<-data.frame(pp$col1,pp$col5)

 one$V3[3:6]<-as.character(one[2,2])
 one<-one[-2,]
 one<-one[-1,]
 colnames(one)<- c("gene", "Experiment 1", "part")

 onetwo$V3[3:6]<-as.character(onetwo[2,2])
 onetwo<-onetwo[-2,]
 onetwo<-onetwo[-1,]
 colnames(onetwo)<- c("gene", "Experiment 1", "part")

 x1<-rbind(one, onetwo)

 two$V3[3:6]<-as.character(two[2,2])
 two<-two[-2,]
 two<-two[-1,]
 colnames(two)<- c("gene", "Experiment 2", "part")


 twotwo$V3[3:6]<-as.character(twotwo[2,2])
 twotwo<-twotwo[-2,]
 twotwo<-twotwo[-1,]
 colnames(twotwo)<- c("gene", "Experiment 2", "part")

 x2<-rbind(two, twotwo)

 x3<-merge(x1,x2)

I apologise for the large amount of code but I am unable to verbalise this operation specifically. pp is the example data frame and x3 is the format I require. Is there a better way to do this?

解决方案

This might be a shorter way to do it:

pp.new <- as.data.frame(t(pp)[-1,], row.names = 1)
names(pp.new) <- c("experiment", "part", "gene1", "gene2", "gene3", "gene4")

which gives:

> pp.new
   experiment   part gene1 gene2 gene3 gene4
1 Experiment1 Part 1     a     b     c     d
2 Experiment1 Part 2     e     f     g     h
3 Experiment2 Part 1     i     j     k     l
4 Experiment2 Part 2     m     n     o     p

However, it is probably better to transform this into long format with the reshape2 package:

library(reshape2)    
pp.long <- melt(pp.new, id=c("experiment","part"))

which results in:

> pp.long
    experiment   part variable value
1  Experiment1 Part 1    gene1     a
2  Experiment1 Part 2    gene1     e
3  Experiment2 Part 1    gene1     i
4  Experiment2 Part 2    gene1     m
5  Experiment1 Part 1    gene2     b
6  Experiment1 Part 2    gene2     f
7  Experiment2 Part 1    gene2     j
8  Experiment2 Part 2    gene2     n
9  Experiment1 Part 1    gene3     c
10 Experiment1 Part 2    gene3     g
11 Experiment2 Part 1    gene3     k
12 Experiment2 Part 2    gene3     o
13 Experiment1 Part 1    gene4     d
14 Experiment1 Part 2    gene4     h
15 Experiment2 Part 1    gene4     l
16 Experiment2 Part 2    gene4     p

If you want to get a compareable output as in x3, you can use the recast function (also from the reshape2 package):

recast(pp.new, part + variable ~ experiment, id.var=c("experiment","part"), value.var = "value")

which gives:

    part variable Experiment1 Experiment2
1 Part 1    gene1           a           i
2 Part 1    gene2           b           j
3 Part 1    gene3           c           k
4 Part 1    gene4           d           l
5 Part 2    gene1           e           m
6 Part 2    gene2           f           n
7 Part 2    gene3           g           o
8 Part 2    gene4           h           p

这篇关于操作每个实验有多个列的数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

操作每个实验有多个列的数据框 [英] Manipulate a data frame where there are multiple colums for each experiment

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

操作每个实验有多个列的数据框 [英] Manipulate a data frame where there are multiple colums for each experiment

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭