汇总相邻的行，忽略某些列 [英] Aggregate adjacent rows, ignoring certain columns

查看：96 发布时间：2020/6/2 20:42:06 r aggregate reshape

本文介绍了汇总相邻的行，忽略某些列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类似下面的df

 > head（df）
 OrderId时间戳错误代码
 1 3000000 1455594300434609920 NA 
 2 3000001 1455594300434614272 NA 
 3 3000000 1455594300440175104 0 
 4 3000001 1455594300440179712 0 
 5 3000002 1455594303468741120 NA 
 6 3000002 1455594303469326848 0

我需要折叠行，以使输出是某种东西如下所示

> head（df） OrderId Timestamp1 Timestamp2 ErrorCode Diff 3000000 1455594300434609920 1455594300440175104 0 3000001 1455594300434614272 1455594300440179712 0 3000002 1455594303468741120 1455594303469326848 0 pre>

我用 df2 = aggregate（Timestamp〜。，df，FUN = toString）
但是输出是

  OrderId ErrorCode时间戳
 10 3000001 0 1455594300440179712 
 11 3000002 0 1455594303469326848 
 12 3000003 0 1455594303713897984

当我删除ErrorCode列并使用相同的命令时，得到了预期的输出

 > head（kf）
 OrderId时间戳
 1 3000000 1455594300434609920 
 2 3000001 1455594300434614272 
 3 3000000 1455594300440175104 
 43000001 1455594300440179712 
 5 3000002 1455594303468741120 
 6 3000002 1455594303469326848 
> kf2 = aggregate（Timestamp〜。，kf，FUN = toString）
 head（kf2）
 OrderId时间戳
 10 3000001 1455594300434614272，1455594300440179712 
 11 3000002 1455594303468741120，1455594303469326848 
 12 3000003 1455594303711330816，1455594303713897984

如何以上述方式汇总而不删除ErrorCode列。

解决方案

我认为您实际上只是在将数据重塑为格式，分别为时间戳1和2提供单独的列。一种方法是，首先添加一个新列，该列定义测量的时间点，然后使用 reshape2 融合并转换数据。

 ＃为数据添加一个索引。
用于（i in unique（df $ OrderId））{ 
 ii<-df $ OrderId == i 
 df $ time_ind [ii]<-seq_along（ii [ii]）
} 
 
 library（reshape2 ）
 
 df_long< -melt（df，id.vars = c（ OrderId， time_ind），
 measure.vars = c（ Timestamp， ErrorCode） ）
 
 dcast（df_long，OrderId〜variable + time_ind）

给你

  OrderId Timestamp_1 Timestamp_2 ErrorCode_1 ErrorCode_2 
 1 3000000 1455594300434609920 1455594300440175104< NA> 0 
 2 3000001 1455594300434614272 1455594300440179712< NA> 0 
 3 3000002 1455594303468741120 1455594303469326848< NA> 0

I have a df like below

> head(df)
  OrderId           Timestamp ErrorCode
1 3000000 1455594300434609920        NA
2 3000001 1455594300434614272        NA
3 3000000 1455594300440175104         0
4 3000001 1455594300440179712         0
5 3000002 1455594303468741120        NA
6 3000002 1455594303469326848         0

I need to collapse row in a way that output is something like below

> head(df)
  OrderId         Timestamp1  Timestamp2       ErrorCode Diff
 3000000 1455594300434609920  1455594300440175104      0
 3000001 1455594300434614272  1455594300440179712      0
 3000002 1455594303468741120  1455594303469326848      0

I used df2=aggregate(Timestamp~.,df,FUN=toString) But output is

   OrderId ErrorCode           Timestamp
10 3000001         0 1455594300440179712
11 3000002         0 1455594303469326848
12 3000003         0 1455594303713897984

When I dropped the ErrorCode column and used the same command, I get an expected output

> head(kf)
  OrderId           Timestamp
1 3000000 1455594300434609920
2 3000001 1455594300434614272
3 3000000 1455594300440175104
4 3000001 1455594300440179712
5 3000002 1455594303468741120
6 3000002 1455594303469326848
> kf2=aggregate(Timestamp~.,kf,FUN=toString)
head(kf2)
   OrderId                                Timestamp
10 3000001 1455594300434614272, 1455594300440179712
11 3000002 1455594303468741120, 1455594303469326848
12 3000003 1455594303711330816, 1455594303713897984

How do I aggregate it in the above manner without removing ErrorCode column. There must be some little thing I am missing.

解决方案

I take it you're actually looking just to reshape your data into a wide format with separate columns for timestamp 1 and 2. One way is to first add a new column that defines the time point of the measurement and then melt and cast the data using reshape2.

# Add an index to the data.frame
for (i in unique(df$OrderId)) {
  ii <- df$OrderId == i
  df$time_ind[ii] <- seq_along(ii[ii])
}

library(reshape2)

df_long <- melt(df, id.vars = c("OrderId", "time_ind"),
                measure.vars = c("Timestamp", "ErrorCode"))

dcast(df_long, OrderId ~ variable + time_ind)

which will give you

  OrderId         Timestamp_1         Timestamp_2 ErrorCode_1 ErrorCode_2
1 3000000 1455594300434609920 1455594300440175104        <NA>           0
2 3000001 1455594300434614272 1455594300440179712        <NA>           0
3 3000002 1455594303468741120 1455594303469326848        <NA>           0

这篇关于汇总相邻的行，忽略某些列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

汇总相邻的行，忽略某些列 [英] Aggregate adjacent rows, ignoring certain columns

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

汇总相邻的行，忽略某些列 [英] Aggregate adjacent rows, ignoring certain columns

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭