如何将多行组合成R中的一个观察结果 [英] How to combine multiple rows into one observation in R

查看：141 发布时间：2017/3/26 2:17:12 r dataframe

本文介绍了如何将多行组合成R中的一个观察结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对R来说比较新，我试图把我的数据放在一个合适的格式，我挂了。看来，重塑包可能对此有用，但我不会比这更有用。

我有一个数据框架，其中一列V4）包含字符串和数字。我想通过V2和V1中给出的分组来分割V4，并将结果作为三个单独的列附加到数据框中。

编辑：作为我的原始示例数据框没有很好地捕捉问题的复杂性，这里是一个更准确的例子：

 > df<  -  data.frame （V1 = c（rep（SN，8），rep（JK，4）），
 V2 = c（1,1,2,2,3,3,3,1， 1,2），
 V3 = c（图片，响应，声音，声音，响应，声音，声音 ，响应，声音，声音），
 V4 = c（照片，100，XYZc02i03，XYZq02i03，200，ZYXc01i30，ZYXq01i30 XYZc02i40，200，XYZc02i03，XYZq02i03），
 stringsAsFactors = FALSE）
 
 
> V1 V2 V3 V4 
 SN 1 Picture Photo 
 SN 1响应100 
 SN 2声音XYZc02i03 
 SN 2声音XYZq02i03 
 SN 2响应200 
 SN 3声音ZYXc01i30 
 SN 3声音ZYX q01i30 
 SN 3响应100 
 JK 1声音XYZc02i40 
 JK 1响应200 
 JK 2声音XYZc02i03 
 JK 2声音XYZq02i03

我想得到这样的东西：

  V1 V2 V3 V4 V5 V6 
 SN 1图片照片NA 100 
 SN 2声音XYZc02i03 XYZq02i03 200 
 SN 3声音ZYXc01i30 ZYXq01i30 100 
 JK 1声音XYZc02i40 NA 200 
 JK 2声音XYZc02i03 XYZq02i03 NA

编辑：我并不总是有相同数量在V2中观察，这意味着在我想要获得的数据帧中可能会出现V4，V5或V6的缺失值。

编辑2：V6应映射到响应V3，V4和V5中的变量理想地以连续的顺序从V3映射声音值。

我将非常感谢任何关于这方面的建议。或者，如果这个问题已经解决了，我错过了，链接也是很棒的。

解决方案

你不在 df 的定义中需要一个 cbind 。你会使用这样的东西：

  df<  -  data.frame（V1 = rep（SN，6） ，
 V2 = rep（2：3，each = 3），
 V3 = c（声音，声音，响应，声音，声音 ，
 V4 = c（XYZc02i03，XYZq02i03，200，ZYXc01i30，ZYXq01i30，100），
 stringsAsFactors = FALSE）
  / pre> 
 
 但是给出一个像你所描述的数据框，你可以通过以下方式获得所需的结果：
  max.subset.len<  -  3＃或者可能最大（sapply（split（df，list（df $ V1，df $ V2）），FUN = nrow））
 fun<  -  function（v4）{length（v4）<  -  max.subset.len; v4} 
 agg<  - 聚合（df $ V4，by = list（df $ V1，df $ V2），FUN = fun）
结果<  -  cbind（agg [1：2] agg [[3]]）
  
 
I am relatively new to R and I am kind of hung up at trying to put my data into a suitable format. It seems like the reshape package might be useful for this, but I don't get any further than that.

I have a data frame in which one of the columns (V4) contains strings and numericals. I would like to split V4 by the grouping given in V2 and V1 and attach the results as three seperate columns to the data frame.

Edit: As my original example data frame did not quite capture the complexity of the problem, here is a more accurate example:
>df <- data.frame(V1=c(rep("SN", 8),rep("JK", 4)), 
             V2=c(1,1,2,2,2,3,3,3,1,1,2,2), 
             V3=c("Picture", "Response", "Sound", "Sound", "Response", "Sound", "Sound", "Response", "Sound", "Response", "Sound", "Sound"), 
             V4=c("Photo", "100", "XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", 100, "XYZc02i40", 200, "XYZc02i03", "XYZq02i03" ), 
             stringsAsFactors=FALSE)


>V1 V2       V3        V4
 SN  1  Picture     Photo
 SN  1 Response       100
 SN  2    Sound XYZc02i03
 SN  2    Sound XYZq02i03
 SN  2 Response       200
 SN  3    Sound ZYXc01i30
 SN  3    Sound ZYXq01i30
 SN  3 Response       100
 JK  1    Sound XYZc02i40
 JK  1 Response       200
 JK  2    Sound XYZc02i03
 JK  2    Sound XYZq02i03
And I want to get something like this:
   V1  V2       V3          V4        V5   V6
   SN   1  Picture       Photo        NA  100
   SN   2    Sound   XYZc02i03 XYZq02i03  200
   SN   3    Sound   ZYXc01i30 ZYXq01i30  100
   JK   1    Sound   XYZc02i40        NA  200
   JK   2    Sound   XYZc02i03 XYZq02i03   NA
EDIT: I don't always have the same number of observations in V2, which means there could be missing values for V4, V5, or V6 in the data frame I want to get. 

Edit2: V6 should map onto the "response" Variable from V3, V4 and V5 ideally map on the "Sound" values from V3 in consecutive order.

I would be very appreciative of any advice on how to go about this. Or, if this problem has been solved elswhere and I missed it, a link would also be great.
 解决方案 
You don't need a cbind in your definition of df. You'd use something like this:
df <- data.frame(V1=rep("SN", 6), 
                 V2=rep(2:3, each=3), 
                 V3=c("Sound", "Sound", "Response", "Sound", "Sound", "Response"), 
                 V4=c("XYZc02i03", "XYZq02i03", 200, "ZYXc01i30", "ZYXq01i30", 100), 
                 stringsAsFactors=FALSE)
But given a dataframe like the one you describe, you can get the desired results with:
max.subset.len <- 3 # or maybe max(sapply(split(df, list(df$V1, df$V2)), FUN=nrow))
fun <- function(v4) {length(v4) <- max.subset.len; v4}
agg <- aggregate(df$V4, by=list(df$V1, df$V2), FUN=fun)
results <- cbind(agg[1:2], agg[[3]])


                        
这篇关于如何将多行组合成R中的一个观察结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将多行组合成R中的一个观察结果 [英] How to combine multiple rows into one observation in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将多行组合成R中的一个观察结果 [英] How to combine multiple rows into one observation in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭