如何将多行组合成R中的一个观察结果 [英] How to combine multiple rows into one observation in R

查看:141
本文介绍了如何将多行组合成R中的一个观察结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R来说比较新,我试图把我的数据放在一个合适的格式,我挂了。看来,重塑包可能对此有用,但我不会比这更有用。



我有一个数据框架,其中一列V4)包含字符串和数字。我想通过V2和V1中给出的分组来分割V4,并将结果作为三个单独的列附加到数据框中。



编辑:作为我的原始示例数据框没有很好地捕捉问题的复杂性,这里是一个更准确的例子:

 > df<  -  data.frame (V1 = c(rep(SN,8),rep(JK,4)),
V2 = c(1,1,2,2,3,3,3,1, 1,2),
V3 = c(图片,响应,声音,声音,响应,声音,声音 ,响应,声音,声音),
V4 = c(照片,100,XYZc02i03,XYZq02i03,200,ZYXc01i30,ZYXq01i30 XYZc02i40,200,XYZc02i03,XYZq02i03),
stringsAsFactors = FALSE)


> V1 V2 V3 V4
SN 1 Picture Photo
SN 1响应100
SN 2声音XYZc02i03
SN 2声音XYZq02i03
SN 2响应200
SN 3声音ZYXc01i30
SN 3声音ZYX q01i30
SN 3响应100
JK 1声音XYZc02i40
JK 1响应200
JK 2声音XYZc02i03
JK 2声音XYZq02i03

我想得到这样的东西:

  V1 V2 V3 V4 V5 V6 
SN 1图片照片NA 100
SN 2声音XYZc02i03 XYZq02i03 200
SN 3声音ZYXc01i30 ZYXq01i30 100
JK 1声音XYZc02i40 NA 200
JK 2声音XYZc02i03 XYZq02i03 NA

编辑:我并不总是有相同数量在V2中观察,这意味着在我想要获得的数据帧中可能会出现V4,V5或V6的缺失值。



编辑2:V6应映射到响应V3,V4和V5中的变量理想地以连续的顺序从V3映射声音值。



我将非常感谢任何关于这方面的建议。或者,如果这个问题已经解决了,我错过了,链接也是很棒的。

解决方案

你不在 df 的定义中需要一个 cbind 。你会使用这样的东西:

  df<  -  data.frame(V1 = rep(SN,6) ,
V2 = rep(2:3,each = 3),
V3 = c(声音,声音,响应,声音,声音 ,
V4 = c(XYZc02i03,XYZq02i03,200,ZYXc01i30,ZYXq01i30,100),
stringsAsFactors = FALSE)
/ pre>

但是给出一个像你所描述的数据框,你可以通过以下方式获得所需的结果:

  max.subset.len<  -  3#或者可能最大(sapply(split(df,list(df $ V1,df $ V2)),FUN = nrow))
fun< - function(v4){length(v4)< - max.subset.len; v4}
agg< - 聚合(df $ V4,by = list(df $ V1,df $ V2),FUN = fun)
结果< - cbind(agg [1:2] agg [[3]])


I am relatively new to R and I am kind of hung up at trying to put my data into a suitable format. It seems like the reshape package might be useful for this, but I don't get any further than that.

I have a data frame in which one of the columns (V4) contains strings and numericals. I would like to split V4 by the grouping given in V2 and V1 and attach the results as three seperate columns to the data frame.

Edit: As my original example data frame did not quite capture the complexity of the problem, here is a more accurate example:

>df <- data.frame(V1=c(rep("SN", 8),rep("JK", 4)), 
             V2=c(1,1,2,2,2,3,3,3,1,1,2,2), 
             V3=c("Picture", "Response", "Sound", "Sound", "Response", "Sound", "Sound", "Response", "Sound", "Response", "Sound", "Sound"), 
             V4=c("Photo", "100", "XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", 100, "XYZc02i40", 200, "XYZc02i03", "XYZq02i03" ), 
             stringsAsFactors=FALSE)


>V1 V2       V3        V4
 SN  1  Picture     Photo
 SN  1 Response       100
 SN  2    Sound XYZc02i03
 SN  2    Sound XYZq02i03
 SN  2 Response       200
 SN  3    Sound ZYXc01i30
 SN  3    Sound ZYXq01i30
 SN  3 Response       100
 JK  1    Sound XYZc02i40
 JK  1 Response       200
 JK  2    Sound XYZc02i03
 JK  2    Sound XYZq02i03

And I want to get something like this:

   V1  V2       V3          V4        V5   V6
   SN   1  Picture       Photo        NA  100
   SN   2    Sound   XYZc02i03 XYZq02i03  200
   SN   3    Sound   ZYXc01i30 ZYXq01i30  100
   JK   1    Sound   XYZc02i40        NA  200
   JK   2    Sound   XYZc02i03 XYZq02i03   NA

EDIT: I don't always have the same number of observations in V2, which means there could be missing values for V4, V5, or V6 in the data frame I want to get.

Edit2: V6 should map onto the "response" Variable from V3, V4 and V5 ideally map on the "Sound" values from V3 in consecutive order.

I would be very appreciative of any advice on how to go about this. Or, if this problem has been solved elswhere and I missed it, a link would also be great.

解决方案

You don't need a cbind in your definition of df. You'd use something like this:

df <- data.frame(V1=rep("SN", 6), 
                 V2=rep(2:3, each=3), 
                 V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Response"), 
                 V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", 100), 
                 stringsAsFactors=FALSE)

But given a dataframe like the one you describe, you can get the desired results with:

max.subset.len <- 3 # or maybe max(sapply(split(df, list(df$V1, df$V2)), FUN=nrow))
fun <- function(v4) {length(v4) <- max.subset.len; v4}
agg <- aggregate(df$V4, by=list(df$V1, df$V2), FUN=fun)
results <- cbind(agg[1:2], agg[[3]])

这篇关于如何将多行组合成R中的一个观察结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆