从数据框中删除只有零值的时间序列 [英] Removing time series with only zero values from a data frame

查看:18
本文介绍了从数据框中删除只有零值的时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含由唯一 ID 标识的多个时间序列.我想删除任何只有 0 个值的时间序列.

数据框如下,

id 日期值AAA 2010/01/01 9AAA 2010/01/02 10AAA 2010/01/03 8AAA 2010/01/04 4AAA 2010/01/05 12B 2010/01/01 0B 2010/01/02 0B 2010/01/03 0B 2010/01/04 0B 2010/01/05 0CCC 2010/01/01 45CCC 2010/01/02 46CCC 2010/01/03 0CCC 2010/01/04 0CCC 2010/01/05 40

我希望删除任何只有 0 个值的时间序列,以便数据框如下所示,

id 日期值AAA 2010/01/01 9AAA 2010/01/02 10AAA 2010/01/03 8AAA 2010/01/04 4AAA 2010/01/05 12CCC 2010/01/01 45CCC 2010/01/02 46CCC 2010/01/03 0CCC 2010/01/04 0CCC 2010/01/05 40

这是对上一个问题的跟进,该问题使用 data.tables 包.

R有效地从 1 个数据帧中的多个时间序列的开头和结尾删除缺失值

解决方案

如果dat是一个data.table,那么这就是易写易读:

dat[,.SD[any(value!=0)],by=id]

.SD 代表数据子集.这个答案解释了 .SD 很好.

借鉴 Gabor 对 ave 的出色使用,但没有重复三次相同的变量名 (DF),如果您有这可能是拼写错误的来源很多长的或类似的变量名,试试:

dat[ ave(value!=0,id,FUN=any) ]

这两者之间的速度差异可能取决于几个因素,包括:i) 组数 ii) 每个组的大小和 iii) 实际 dat 中的列数.p>

I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.

The data frame looks as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
B    2010/01/01    0
B    2010/01/02    0
B    2010/01/03    0
B    2010/01/04    0
B    2010/01/05    0
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

I want any time series with only 0 values to be removed so that the data frame look as follows,

id   date          value
AAA  2010/01/01    9
AAA  2010/01/02    10
AAA  2010/01/03    8
AAA  2010/01/04    4
AAA  2010/01/05    12
CCC  2010/01/01    45
CCC  2010/01/02    46
CCC  2010/01/03    0
CCC  2010/01/04    0
CCC  2010/01/05    40

This is a follow up to a previous question that was answered with a really great solution using the data.tables package.

R efficiently removing missing values from the start and end of multiple time series in 1 data frame

解决方案

If dat is a data.table, then this is easy to write and read :

dat[,.SD[any(value!=0)],by=id]

.SD stands for Subset of Data. This answer explains .SD very well.

Picking up on Gabor's nice use of ave, but without repeating the same variable name (DF) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :

dat[ ave(value!=0,id,FUN=any) ]

The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat.

这篇关于从数据框中删除只有零值的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆