按每个组的第一个和最后一个值进行子集 [英] Subset by first and last value per group

查看:153
本文介绍了按每个组的第一个和最后一个值进行子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个数据框,其中有两列 temp timeStamp 。数据经常具有 temp 值。数据框的一部分看起来像 -

I have a data frame in R with two columns temp and timeStamp. The data has temp values regularly. A portion of dataframe looks like-

I必须创建显示随时间变化的线条图。从这里可以看出,对于几个 timeStamp temp 的值保持不变。拥有这些重复值会增加数据文件的大小,我想删除它们。所以输出应该是这样的 -

I have to create line chart showing changes in temp over time. As can be seen here, temp values remain the same for several timeStamp. Having these repeating value increases the size of data file and I want to remove them. So the output should look like this-

只显示有变化的值。
无法想像得到这样的想法。任何对方向的输入都是非常有用的。

Showing just the values where there is a change. Cannot think of a way to get this think done in R. Any inputs in the right direction would be really helpful.

推荐答案

一个选项是使用 data.table 。我们将'data.frame'转换为'data.table'( setDT(df1))。按temp分组,我们将每个组的第一个和最后一个观察( .SD [c(1L,.N)] 进行子集。如果每个组只有一个值,那么我们将这个行( else .SD )。

One option would be using data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)). Grouped by 'temp', we subset the first and last observation (.SD[c(1L, .N)]) per each group. If there is only a single value per group, we take the row as such (else .SD).

library(data.table)
setDT(df1)[, if(.N>1) .SD[c(1L, .N)] else .SD, by =temp]
#    temp val
#1: 22.50   1
#2: 22.50   4
#3: 22.37   5
#4: 22.42   6
#5: 22.42   7






base R 选项与重复。我们在'temp'(输出是一个逻辑向量)中检查重复的值,并检查反向的复制( fromLast = TRUE )。在这两种情况下,使用& 找到 TRUE元素,否定(),并将'df1'的行子集。


Or a base R option with duplicated. We check the duplicated values in 'temp' (output is a logical vector), and also check the duplication from the reverse side (fromLast=TRUE). Use & to find the elements that are TRUE in both cases, negate (!) and subset the rows of 'df1'.

df1[!(duplicated(df1$temp) & duplicated(df1$temp,fromLast=TRUE)),]
#   temp val
#1 22.50   1
#4 22.50   4
#5 22.37   5
#6 22.42   6
#7 22.42   7



数据



data

df1 <- data.frame(temp=c(22.5, 22.5, 22.5, 22.5, 22.37,22.42, 22.42), val=1:7)

这篇关于按每个组的第一个和最后一个值进行子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆