复杂的重塑 [英] complicated reshaping
问题描述
对于以下示例:
df< -data.frame(Par1 = unlist(strsplit(AABBCCC )),
Par2 = unlist(strsplit(DDEEFFF,)),
ParD = unlist(strsplit(foo,bar,baz,qux,bla,xyz,meh ,)),
Type = unlist(strsplit(pre,post,pre,post,pre,post,post,,)),
Val = c(10,20, 30,40,50,60,70))
#Par1 Par2 ParD Type Val
#1 AD foo pre 10
#2 AD bar post 20
# 3 BE baz pre 30
#4 BE qux post 40
#5 CF bla pre 50
#6 CF xyz post 60
#7 CF meh post 70
dfw< -dcast(df,
formula = Par1 + Par2〜Type,
value.var =Val,
fun.aggregate = mean)
#Par1 Par2 post pre
#1 AD 20 10
#2 BE 40 30
#3 CF 65 50
这几乎是wha我需要,但我想要有一些字段保留来自ParD字段的数据(例如,作为单个合并字符串),
ie我希望得到的data.frame如下:
#Par1 Par2 post pre Num.pre Num.post ParD
#1 AD 20 10 1 1 foo_bar
#2 BE 40 30 1 1 baz_qux
#3 CF 65 50 1 2 bla_xyz_meh
我会感谢任何想法。例如,我试图通过在dcast中编写来解决第二个任务: fun.aggregate = function(x)c(Val = mean(x),Num = length(x))
- 但这会导致错误...
提前感谢!
使用 ddply
的两个步骤解决方案(我不满意,但我得到结果)
dat < - ddply(df,。(Par1,Par2),function(x){
data.frame(ParD = paste(paste(x $ ParD) collapse ='_'),
Num.pre = length(x $ Type [x $ Type =='pre']),
Num.post = length(x $ Type [x $ Type = ='post']))
})
合并(dfw,dat)
Par1 Par2 post pre ParD Num.pre Num.post
1 AD 2.0 1 foo_bar 1 1
2 BE 4.0 3 baz_qux 1 1
3 CF 6.5 5 bla_xyz_meh 1 2
I want to reshape my dataframe from long to wide format and I loose some data that I'd like to keep. For the following example:
df<-data.frame(Par1=unlist(strsplit("AABBCCC","")),
Par2=unlist(strsplit("DDEEFFF","")),
ParD=unlist(strsplit("foo,bar,baz,qux,bla,xyz,meh",",")),
Type=unlist(strsplit("pre,post,pre,post,pre,post,post",",")),
Val=c(10,20,30,40,50,60,70))
# Par1 Par2 ParD Type Val
# 1 A D foo pre 10
# 2 A D bar post 20
# 3 B E baz pre 30
# 4 B E qux post 40
# 5 C F bla pre 50
# 6 C F xyz post 60
# 7 C F meh post 70
dfw<-dcast(df,
formula = Par1+Par2~Type,
value.var="Val",
fun.aggregate=mean)
# Par1 Par2 post pre
# 1 A D 20 10
# 2 B E 40 30
# 3 C F 65 50
this is almost what I need but I would like to have
- some field keeping data from ParD field (for example, as single merged string),
- number of observations used for aggregation.
i.e. I would like the resulting data.frame to be as follows:
# Par1 Par2 post pre Num.pre Num.post ParD
# 1 A D 20 10 1 1 foo_bar
# 2 B E 40 30 1 1 baz_qux
# 3 C F 65 50 1 2 bla_xyz_meh
I would be grateful for any ideas. For example, I tried to solve the second task by writing in dcast: fun.aggregate=function(x) c(Val=mean(x),Num=length(x))
- but this causes an error...
Thanks in advance!
Solution in 2 steps using ddply
( i am not happy with , but I get the result)
dat <- ddply(df,.(Par1,Par2),function(x){
data.frame(ParD=paste(paste(x$ParD),collapse='_'),
Num.pre =length(x$Type[x$Type =='pre']),
Num.post = length(x$Type[x$Type =='post']))
})
merge(dfw,dat)
Par1 Par2 post pre ParD Num.pre Num.post
1 A D 2.0 1 foo_bar 1 1
2 B E 4.0 3 baz_qux 1 1
3 C F 6.5 5 bla_xyz_meh 1 2
这篇关于复杂的重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!