复杂的重塑 [英] complicated reshaping

查看:161
本文介绍了复杂的重塑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从长到宽的格式重塑我的数据框,并且松开了我想保留的一些数据。
对于以下示例:

  df< -data.frame(Par1 = unlist(strsplit(AABBCCC )),
Par2 = unlist(strsplit(DDEEFFF,)),
ParD = unlist(strsplit(foo,bar,baz,qux,bla,xyz,meh ,)),
Type = unlist(strsplit(pre,post,pre,post,pre,post,post,,)),
Val = c(10,20, 30,40,50,60,70))

#Par1 Par2 ParD Type Val
#1 AD foo pre 10
#2 AD bar post 20
# 3 BE baz pre 30
#4 BE qux post 40
#5 CF bla pre 50
#6 CF xyz post 60
#7 CF meh post 70

dfw< -dcast(df,
formula = Par1 + Par2〜Type,
value.var =Val,
fun.aggregate = mean)

#Par1 Par2 post pre
#1 AD 20 10
#2 BE 40 30
#3 CF 65 50

这几乎是wha我需要,但我想要有一些字段保留来自ParD字段的数据(例如,作为单个合并字符串),




  • 用于聚合的观察次数。

  • ie我希望得到的data.frame如下:

     #Par1 Par2 post pre Num.pre Num.post ParD 
    #1 AD 20 10 1 1 foo_bar
    #2 BE 40 30 1 1 baz_qux
    #3 CF 65 50 1 2 bla_xyz_meh

    我会感谢任何想法。例如,我试图通过在dcast中编写来解决第二个任务: fun.aggregate = function(x)c(Val = mean(x),Num = length(x)) - 但这会导致错误...



    提前感谢!

    解决方案

    使用 ddply 的两个步骤解决方案(我不满意,但我得到结果)

      dat < -  ddply(df,。(Par1,Par2),function(x){
    data.frame(ParD = paste(paste(x $ ParD) collapse ='_'),
    Num.pre = length(x $ Type [x $ Type =='pre']),
    Num.post = length(x $ Type [x $ Type = ='post']))
    })

    合并(dfw,dat)
    Par1 Par2 post pre ParD Num.pre Num.post
    1 AD 2.0 1 foo_bar 1 1
    2 BE 4.0 3 baz_qux 1 1
    3 CF 6.5 5 bla_xyz_meh 1 2


    I want to reshape my dataframe from long to wide format and I loose some data that I'd like to keep. For the following example:

    df<-data.frame(Par1=unlist(strsplit("AABBCCC","")),
                   Par2=unlist(strsplit("DDEEFFF","")),
                   ParD=unlist(strsplit("foo,bar,baz,qux,bla,xyz,meh",",")),
                   Type=unlist(strsplit("pre,post,pre,post,pre,post,post",",")),
                   Val=c(10,20,30,40,50,60,70))
    
       #     Par1 Par2 ParD Type Val
       #   1    A    D  foo  pre  10
       #   2    A    D  bar post  20
       #   3    B    E  baz  pre  30
       #   4    B    E  qux post  40
       #   5    C    F  bla  pre  50
       #   6    C    F  xyz post  60
       #   7    C    F  meh post  70
    
    dfw<-dcast(df,
           formula = Par1+Par2~Type,
           value.var="Val",
           fun.aggregate=mean)
    
     #     Par1 Par2 post pre
     #   1    A    D   20  10
     #   2    B    E   40  30
     #   3    C    F   65  50
    

    this is almost what I need but I would like to have

    1. some field keeping data from ParD field (for example, as single merged string),
    2. number of observations used for aggregation.

    i.e. I would like the resulting data.frame to be as follows:

        #     Par1 Par2 post pre Num.pre Num.post ParD
        #   1    A    D   20  10      1      1    foo_bar 
        #   2    B    E   40  30      1      1    baz_qux
        #   3    C    F   65  50      1      2    bla_xyz_meh
    

    I would be grateful for any ideas. For example, I tried to solve the second task by writing in dcast: fun.aggregate=function(x) c(Val=mean(x),Num=length(x)) - but this causes an error...

    Thanks in advance!

    解决方案

    Solution in 2 steps using ddply ( i am not happy with , but I get the result)

    dat <- ddply(df,.(Par1,Par2),function(x){
      data.frame(ParD=paste(paste(x$ParD),collapse='_'),
                 Num.pre =length(x$Type[x$Type =='pre']),
                 Num.post = length(x$Type[x$Type =='post']))
    })
    
    merge(dfw,dat)
     Par1 Par2 post pre        ParD Num.pre Num.post
    1    A    D  2.0   1     foo_bar       1        1
    2    B    E  4.0   3     baz_qux       1        1
    3    C    F  6.5   5 bla_xyz_meh       1        2
    

    这篇关于复杂的重塑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆