汇总相邻的行,忽略某些列 [英] Aggregate adjacent rows, ignoring certain columns

查看:96
本文介绍了汇总相邻的行,忽略某些列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似下面的df

 > head(df)
OrderId时间戳错误代码
1 3000000 1455594300434609920 NA
2 3000001 1455594300434614272 NA
3 3000000 1455594300440175104 0
4 3000001 1455594300440179712 0
5 3000002 1455594303468741120 NA
6 3000002 1455594303469326848 0

我需要折叠行,以使输出是某种东西如下所示

 > head(df)
OrderId Timestamp1 Timestamp2 ErrorCode Diff
3000000 1455594300434609920 1455594300440175104 0
3000001 1455594300434614272 1455594300440179712 0
3000002 1455594303468741120 1455594303469326848 0
pre>

我用 df2 = aggregate(Timestamp〜。,df,FUN = toString)
但是输出是

  OrderId ErrorCode时间戳
10 3000001 0 1455594300440179712
11 3000002 0 1455594303469326848
12 3000003 0 1455594303713897984

当我删除ErrorCode列并使用相同的命令时,得到了预期的输出

 > head(kf)
OrderId时间戳
1 3000000 1455594300434609920
2 3000001 1455594300434614272
3 3000000 1455594300440175104
43000001 1455594300440179712
5 3000002 1455594303468741120
6 3000002 1455594303469326848
> kf2 = aggregate(Timestamp〜。,kf,FUN = toString)
head(kf2)
OrderId时间戳
10 3000001 1455594300434614272,1455594300440179712
11 3000002 1455594303468741120,1455594303469326848
12 3000003 1455594303711330816,1455594303713897984

如何以上述方式汇总而不删除ErrorCode列。

解决方案

我认为您实际上只是在将数据重塑为格式,分别为时间戳1和2提供单独的列。一种方法是,首先添加一个新列,该列定义测量的时间点,然后使用 reshape2 融合并转换数据。

 #为数据添加一个索引。
用于(i in unique(df $ OrderId)){
ii<-df $ OrderId == i
df $ time_ind [ii]<-seq_along(ii [ii])
}

library(reshape2 )

df_long< -melt(df,id.vars = c( OrderId, time_ind),
measure.vars = c( Timestamp, ErrorCode) )

dcast(df_long,OrderId〜variable + time_ind)

给你

  OrderId Timestamp_1 Timestamp_2 ErrorCode_1 ErrorCode_2 
1 3000000 1455594300434609920 1455594300440175104< NA> 0
2 3000001 1455594300434614272 1455594300440179712< NA> 0
3 3000002 1455594303468741120 1455594303469326848< NA> 0


I have a df like below

> head(df)
  OrderId           Timestamp ErrorCode
1 3000000 1455594300434609920        NA
2 3000001 1455594300434614272        NA
3 3000000 1455594300440175104         0
4 3000001 1455594300440179712         0
5 3000002 1455594303468741120        NA
6 3000002 1455594303469326848         0

I need to collapse row in a way that output is something like below

> head(df)
  OrderId         Timestamp1  Timestamp2       ErrorCode Diff
 3000000 1455594300434609920  1455594300440175104      0
 3000001 1455594300434614272  1455594300440179712      0
 3000002 1455594303468741120  1455594303469326848      0

I used df2=aggregate(Timestamp~.,df,FUN=toString) But output is

   OrderId ErrorCode           Timestamp
10 3000001         0 1455594300440179712
11 3000002         0 1455594303469326848
12 3000003         0 1455594303713897984

When I dropped the ErrorCode column and used the same command, I get an expected output

> head(kf)
  OrderId           Timestamp
1 3000000 1455594300434609920
2 3000001 1455594300434614272
3 3000000 1455594300440175104
4 3000001 1455594300440179712
5 3000002 1455594303468741120
6 3000002 1455594303469326848
> kf2=aggregate(Timestamp~.,kf,FUN=toString)
head(kf2)
   OrderId                                Timestamp
10 3000001 1455594300434614272, 1455594300440179712
11 3000002 1455594303468741120, 1455594303469326848
12 3000003 1455594303711330816, 1455594303713897984

How do I aggregate it in the above manner without removing ErrorCode column. There must be some little thing I am missing.

解决方案

I take it you're actually looking just to reshape your data into a wide format with separate columns for timestamp 1 and 2. One way is to first add a new column that defines the time point of the measurement and then melt and cast the data using reshape2.

# Add an index to the data.frame
for (i in unique(df$OrderId)) {
  ii <- df$OrderId == i
  df$time_ind[ii] <- seq_along(ii[ii])
}

library(reshape2)

df_long <- melt(df, id.vars = c("OrderId", "time_ind"),
                measure.vars = c("Timestamp", "ErrorCode"))

dcast(df_long, OrderId ~ variable + time_ind)

which will give you

  OrderId         Timestamp_1         Timestamp_2 ErrorCode_1 ErrorCode_2
1 3000000 1455594300434609920 1455594300440175104        <NA>           0
2 3000001 1455594300434614272 1455594300440179712        <NA>           0
3 3000002 1455594303468741120 1455594303469326848        <NA>           0

这篇关于汇总相邻的行,忽略某些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆