重构R中的数据框 [英] Reshaping data frame in R

查看:91
本文介绍了重构R中的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到困难重塑一个大数据帧。而且我相对幸运地避免过去的重塑问题,这也意味着我很可怕。



我目前的数据框看起来像这样: p>

  unique_id seq response detail.name treatment 
a N1 123.23 descr。的N1 T1
a N2 231.12描述。的N2 T1
a N3 231.23描述。的N3 T1
...
b N1 343.23描述。的N1 T2
b N2 281.13描述。的N2 T2
b N3 901.23描述。的N3 T2
...

我想:

  seq detailed.name T1 T2 
N1 descr。的N1 123.23 343.23
N2描述的N2 231.12 281.13
N3描述的N3 231.23 901.23

我已经研究了重塑包,但我不知道如何可以将处理因子转换成单独的列名。



谢谢!



编辑:我尝试在我的本地机器上运行(4GB双核iMac 3.06Ghz ),并且它一直失败:

 > d.tmp.2<  -  cast(d.tmp,`SEQ_ID` +`GENE_INFO`〜treatments)
聚合需要fun.aggregate:用作默认值的长度
R(5751)malloc:** * mmap(size = 647168)失败(错误代码= 12)
***错误:无法分配区域
***在malloc_error_break中设置一个断点来调试

当我有机会的时候,我会尝试在我们的一台较大的机器上运行。

解决方案

reshape总是对我来说似乎很棘手,但总是似乎有一点试验和错误的工作。这是我最终找到的:

 > x 
unique_id seq响应详细名称处理
1 a N1 123.23 dN1 T1
2 a N2 231.12 dN2 T1
3 a N3 231.23 dN3 T1
4 b N1 343.23 dN1 T2
5 b N2 281.13 dN2 T2
6 b N3 901.23 dN3 T2

> x2< - melt(x,c(seq,detailed.name,treatment),response)
> x2
seq详细名称处理变量值
1 N1 dN1 T1响应123.23
2 N2 dN2 T1响应231.12
3 N3 dN3 T1响应231.23
4 N1 dN1 T2响应343.23
5 N2 dN2 T2响应281.13
6 N3 dN3 T2响应901.23

> (x2,seq + detail.name〜treatment)
seq detail.name T1 T2
1 N1 dN1 123.23 343.23
2 N2 dN2 231.12 281.13
3 N3 dN3 231.23 901.23

您的原始数据已经是长格式,但不是熔化/转换使用的长格式。所以我重新融化了。第二个参数(id.vars)是不融合的东西的列表。第三个参数(measure.vars)是不同的东西的列表。



然后,cast使用一个公式。波浪号的左边是保持原样的东西,波浪号的右边是用于调整值列的列。



或多或少.. 。!


I'm running into difficulties reshaping a large dataframe. And I've been relatively fortunate in avoiding reshaping problems in the past, which also means I'm terrible at it.

My current dataframe looks something like this:

unique_id    seq   response    detailed.name    treatment 
a            N1     123.23     descr. of N1     T1
a            N2     231.12     descr. of N2     T1
a            N3     231.23     descr. of N3     T1
...
b            N1     343.23     descr. of N1     T2
b            N2     281.13     descr. of N2     T2
b            N3     901.23     descr. of N3     T2
...

And I'd like:

seq    detailed.name   T1           T2
N1     descr. of N1    123.23       343.23
N2     descr. of N2    231.12       281.13
N3     descr. of N3    231.23       901.23

I've looked into the reshape package, but I'm not sure how I can convert the treatment factors into individual column names.

Thanks!

Edit: I tried running this on my local machine (4GB dual-core iMac 3.06Ghz) and it keeps failing with:

> d.tmp.2 <- cast(d.tmp, `SEQ_ID` + `GENE_INFO` ~ treatments)
Aggregation requires fun.aggregate: length used as default
R(5751) malloc: *** mmap(size=647168) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

I'll try running this on one of our bigger machines when I get a chance.

解决方案

reshape always seems tricky to me too, but it always seems to work with a little trial and error. Here's what I ended up finding:

> x
  unique_id seq response detailed.name treatment
1         a  N1   123.23           dN1        T1
2         a  N2   231.12           dN2        T1
3         a  N3   231.23           dN3        T1
4         b  N1   343.23           dN1        T2
5         b  N2   281.13           dN2        T2
6         b  N3   901.23           dN3        T2

> x2 <- melt(x, c("seq", "detailed.name", "treatment"), "response")
> x2
  seq detailed.name treatment variable  value
1  N1           dN1        T1 response 123.23
2  N2           dN2        T1 response 231.12
3  N3           dN3        T1 response 231.23
4  N1           dN1        T2 response 343.23
5  N2           dN2        T2 response 281.13
6  N3           dN3        T2 response 901.23

> cast(x2, seq + detailed.name ~ treatment)
  seq detailed.name     T1     T2
1  N1           dN1 123.23 343.23
2  N2           dN2 231.12 281.13
3  N3           dN3 231.23 901.23

Your original data was already in long format, but not in the long format that melt/cast uses. So I re-melted it. The second argument (id.vars) is list of things not to melt. The third argument (measure.vars) is the list of things that vary.

Then, the cast uses a formula. Left of the tilde are the things that stay as they are, and right of the tilde are the columns that are used to condition the value column.

More or less...!

这篇关于重构R中的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆