操作每个实验有多个列的数据框 [英] Manipulate a data frame where there are multiple colums for each experiment
问题描述
col1 < - c(,,gene1,gene2,gene3,gene4)
col2 <-C(Experiment1 1,a,b,c,d)
col3 <-C(Experiment1,Part 2,e,f,g h)
col4 < - c(Experiment2,Part 1,i,j,k,l)
col5< - c(Experiment2 part 2,m,n,o,p)
pp< - data.frame(col1,col2,col3,col4,col5)
one& 。框架(pp $ col1,pp $ col2)
onetwo< - data.frame(pp $ col1,pp $ col3)
two< -data.frame(pp $ col1,pp $ col4)
twotwo< -data.frame(pp $ col1,pp $ col5)
one $ V3 [3:6]< -as.character(one [2,2])
one< -one [-2,]
one< --one [-1,]
colnames(1)< - c(gene,Experiment 1,part
onetwo $ V3 [3:6]< -as.character(onetwo [2,2])
onetwo< -onetwo [-2,]
onetwo< onetwo [-1,]
colnames(onetwo)< - c(gene,Experiment 1,part)
x1 <-rbind(one,onetwo)
two $ V3 [3:6]< -as.character(two [2,2])
two< -two [-2,]
two< -two [-1,]
colnames (2)< - c(基因,实验2,部分)
twotwo $ V3 [3:6]< -as.character(twotwo [ 2,2])
twotwo <-twotwo [-2,]
twotwo <-twotwo [-1,]
colnames(twotwo)< - c(基因,实验2,part)
x2 <-rbind(two,twotwo)
x3 <-merge(x1,x2)
对于大量的代码,我深表歉意,但我无法具体说明此操作。 pp是示例数据帧,x3是我需要的格式。有没有更好的方法来实现?
这可能是一个较短的方法:
pp.new< - as.data.frame(t(pp)[ - 1,],row.names = 1)
名称(pp.new)< - c(实验,部分,基因1,基因2,基因3,基因4)
其中:
> pp.new
实验部分gene1 gene2 gene3 gene4
1实验1第1部分abcd
2实验1第2部分efgh
3实验2第1部分ijkl
4实验2第2部分mnop
然而,使用 reshape2 将其转换为长格式可能更好package:
library(reshape2)
pp.long< - melt(pp.new,id = c 实验,部分))
导致:
> pp.long
实验部分变量值
1实验1第1部分基因1 a
2实验1第2部分基因1 e
3实验2第1部分gene1 i
4实验2第2部分基因1 m
5实验1第1部分基因2 b
6实验1第2部分基因2 f
7实验2第1部分基因2 j
8实验2第2部分基因2 n
9实验1第1部分基因3 c
10实验1第2部分基因3 g
11实验2第1部分gene3 k
12实验2第2部分基因3 o
13实验1第1部分gene4 d
14实验1第2部分gene4 h
15 Experiment2 Part 1 gene4 l
16 Experiment2 Part 2 gene4 p
如果要在 x3
中获得可比较的输出,可以使用 recast
函数(也可以从 reshape2 包):
recast(pp.new,part + variable〜experiment,id.var = c (实验 ,part),value.var =value)
其中给出:
部分变量实验1实验2
1第1部分gene1 ai
2第1部分gene2 bj
3第1部分gene3 ck
4 Part 1 gene4 dl
5 Part 2 gene1 em
6 Part 2 gene2 fn
7 Part 2 gene3 go
8 Part 2 gene4 hp
I have many sequencing experiments each with multiple results for each of a few hundred genes, when the data is outputted from another programme it isn't in a useful format for me as all the Experiments and each result are listed along the top and there is one row for each gene. I have written an example data set and how I am currently solving this problem as an example but I would like a more optimal method as my data sets are very large.
col1<- c("","", "gene1", "gene2", "gene3", "gene4")
col2<- c("Experiment1", "Part 1", "a","b","c","d")
col3<- c("Experiment1", "Part 2", "e", "f", "g", "h")
col4<- c("Experiment2", "Part 1", "i", "j", "k", "l")
col5<- c("Experiment2", "Part 2", "m", "n", "o", "p")
pp<- data.frame(col1,col2,col3,col4,col5)
one<-data.frame(pp$col1, pp$col2)
onetwo<- data.frame(pp$col1,pp$col3)
two<-data.frame(pp$col1, pp$col4)
twotwo<-data.frame(pp$col1,pp$col5)
one$V3[3:6]<-as.character(one[2,2])
one<-one[-2,]
one<-one[-1,]
colnames(one)<- c("gene", "Experiment 1", "part")
onetwo$V3[3:6]<-as.character(onetwo[2,2])
onetwo<-onetwo[-2,]
onetwo<-onetwo[-1,]
colnames(onetwo)<- c("gene", "Experiment 1", "part")
x1<-rbind(one, onetwo)
two$V3[3:6]<-as.character(two[2,2])
two<-two[-2,]
two<-two[-1,]
colnames(two)<- c("gene", "Experiment 2", "part")
twotwo$V3[3:6]<-as.character(twotwo[2,2])
twotwo<-twotwo[-2,]
twotwo<-twotwo[-1,]
colnames(twotwo)<- c("gene", "Experiment 2", "part")
x2<-rbind(two, twotwo)
x3<-merge(x1,x2)
I apologise for the large amount of code but I am unable to verbalise this operation specifically. pp is the example data frame and x3 is the format I require. Is there a better way to do this?
This might be a shorter way to do it:
pp.new <- as.data.frame(t(pp)[-1,], row.names = 1)
names(pp.new) <- c("experiment", "part", "gene1", "gene2", "gene3", "gene4")
which gives:
> pp.new
experiment part gene1 gene2 gene3 gene4
1 Experiment1 Part 1 a b c d
2 Experiment1 Part 2 e f g h
3 Experiment2 Part 1 i j k l
4 Experiment2 Part 2 m n o p
However, it is probably better to transform this into long format with the reshape2 package:
library(reshape2)
pp.long <- melt(pp.new, id=c("experiment","part"))
which results in:
> pp.long
experiment part variable value
1 Experiment1 Part 1 gene1 a
2 Experiment1 Part 2 gene1 e
3 Experiment2 Part 1 gene1 i
4 Experiment2 Part 2 gene1 m
5 Experiment1 Part 1 gene2 b
6 Experiment1 Part 2 gene2 f
7 Experiment2 Part 1 gene2 j
8 Experiment2 Part 2 gene2 n
9 Experiment1 Part 1 gene3 c
10 Experiment1 Part 2 gene3 g
11 Experiment2 Part 1 gene3 k
12 Experiment2 Part 2 gene3 o
13 Experiment1 Part 1 gene4 d
14 Experiment1 Part 2 gene4 h
15 Experiment2 Part 1 gene4 l
16 Experiment2 Part 2 gene4 p
If you want to get a compareable output as in x3
, you can use the recast
function (also from the reshape2 package):
recast(pp.new, part + variable ~ experiment, id.var=c("experiment","part"), value.var = "value")
which gives:
part variable Experiment1 Experiment2
1 Part 1 gene1 a i
2 Part 1 gene2 b j
3 Part 1 gene3 c k
4 Part 1 gene4 d l
5 Part 2 gene1 e m
6 Part 2 gene2 f n
7 Part 2 gene3 g o
8 Part 2 gene4 h p
这篇关于操作每个实验有多个列的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!