返回组的第一行 [英] Returning first row of group

查看:61
本文介绍了返回组的第一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由ID组成的数据帧,该数据帧对于组中的每个元素都是相同的,两个日期时间以及这两个日期之间的时间间隔. datetime对象之一是我相关的时间标记.现在,我想获取由每个组的最早条目组成的数据框的子集.条目(尤其是时间间隔)需要保持不变.

I have a dataframe consisting of an ID, that is the same for each element in a group, two datetimes and the time interval between these two. One of the datetime objects is my relevant time marker. Now I like to get a subset of the dataframe that consists of the earliest entry for each group. The entries (especially the time interval) need to stay untouched.

我的第一种方法是根据1. ID和2.相关日期时间对框架进行排序.但是,我无法为每个新组返回第一个条目.

My first approach was to sort the frame according to 1. ID and 2. relevant datetime. However, I wasn't able to return the first entry for each new group.

然后,我一直在查看aggregate()ddply()函数,但是我都没有找到一个仅将第一个条目返回而未对时间间隔值应用聚合函数的选项.

I then have been looking at the aggregate() as well as ddply() function but I could not find an option in both that just returns the first entry without applying an aggregation function to the time interval value.

有没有一种(简便的)方法来实现这一目标?

Is there an (easy) way to accomplish this?

添加: 也许我不清楚通过添加我的aggregate()和ddply()注释.我不一定需要汇总.鉴于数据帧的排序方式是,每个新组的第一行都是我要查找的行,因此只需返回一个子集,该子集的每行具有与之前不同的ID(即每个新组的起始行).

ADDITION: Maybe I was unclear by adding my aggregate() and ddply() notes. I do not necessarily need to aggregate. Given the fact that the dataframe is sorted in a way that the first row of each new group is the row I am looking for, it would suffice to just return a subset with each row that has a different ID than the one before (which is the start-row of each new group).

示例数据:

structure(list(ID = c(1454L, 1322L, 1454L, 1454L, 1855L, 1669L, 
1727L, 1727L, 1488L), Line = structure(c(2L, 1L, 3L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("A", "B", "C"), class = "factor"), 
    Start = structure(c(1357038060, 1357221074, 1357369644, 1357834170, 
    1357913412, 1358151763, 1358691675, 1358789411, 1359538400
    ), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1357110430, 
    1357365312, 1357564413, 1358230679, 1357978810, 1358674600, 
    1358853933, 1359531923, 1359568151), class = c("POSIXct", 
    "POSIXt"), tzone = ""), Interval = c(1206.16666666667, 2403.96666666667, 
    3246.15, 6608.48333333333, 1089.96666666667, 8713.95, 2704.3, 
    12375.2, 495.85)), .Names = c("ID", "Line", "Start", "End", 
"Interval"), row.names = c(NA, -9L), class = "data.frame")

推荐答案

通过重现示例数据帧并进行测试,我找到了一种获得所需结果的方法:

By reproducing the example data frame and testing it I found a way of getting the needed result:

  1. 按相关列(ID,开始)的订单数据

  1. Order data by relevant columns (ID, Start)

ordered_data <- data[order(data$ID, data$Start),]

找到每个新ID的第一行

Find the first row for each new ID

final <- ordered_data[!duplicated(ordered_data$ID),]

这篇关于返回组的第一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆