从数据框中删除只有零值的时间序列 [英] Removing time series with only zero values from a data frame
问题描述
我有一个数据框,其中包含由唯一 ID 标识的多个时间序列.我想删除任何只有 0 个值的时间序列.
数据框如下,
id 日期值AAA 2010/01/01 9AAA 2010/01/02 10AAA 2010/01/03 8AAA 2010/01/04 4AAA 2010/01/05 12B 2010/01/01 0B 2010/01/02 0B 2010/01/03 0B 2010/01/04 0B 2010/01/05 0CCC 2010/01/01 45CCC 2010/01/02 46CCC 2010/01/03 0CCC 2010/01/04 0CCC 2010/01/05 40
我希望删除任何只有 0 个值的时间序列,以便数据框如下所示,
id 日期值AAA 2010/01/01 9AAA 2010/01/02 10AAA 2010/01/03 8AAA 2010/01/04 4AAA 2010/01/05 12CCC 2010/01/01 45CCC 2010/01/02 46CCC 2010/01/03 0CCC 2010/01/04 0CCC 2010/01/05 40
这是对上一个问题的跟进,该问题使用 data.tables 包.
R有效地从 1 个数据帧中的多个时间序列的开头和结尾删除缺失值
如果dat
是一个data.table
,那么这就是易写易读:
dat[,.SD[any(value!=0)],by=id]
.SD
代表数据子集.这个答案解释了 .SD
很好.
借鉴 Gabor 对 ave
的出色使用,但没有重复三次相同的变量名 (DF
),如果您有这可能是拼写错误的来源很多长的或类似的变量名,试试:
dat[ ave(value!=0,id,FUN=any) ]
这两者之间的速度差异可能取决于几个因素,包括:i) 组数 ii) 每个组的大小和 iii) 实际 dat
中的列数.p>
I have a data frame with multiple time series identified by uniquer id's. I would like to remove any time series that have only 0 values.
The data frame looks as follows,
id date value
AAA 2010/01/01 9
AAA 2010/01/02 10
AAA 2010/01/03 8
AAA 2010/01/04 4
AAA 2010/01/05 12
B 2010/01/01 0
B 2010/01/02 0
B 2010/01/03 0
B 2010/01/04 0
B 2010/01/05 0
CCC 2010/01/01 45
CCC 2010/01/02 46
CCC 2010/01/03 0
CCC 2010/01/04 0
CCC 2010/01/05 40
I want any time series with only 0 values to be removed so that the data frame look as follows,
id date value
AAA 2010/01/01 9
AAA 2010/01/02 10
AAA 2010/01/03 8
AAA 2010/01/04 4
AAA 2010/01/05 12
CCC 2010/01/01 45
CCC 2010/01/02 46
CCC 2010/01/03 0
CCC 2010/01/04 0
CCC 2010/01/05 40
This is a follow up to a previous question that was answered with a really great solution using the data.tables package.
R efficiently removing missing values from the start and end of multiple time series in 1 data frame
If dat
is a data.table
, then this is easy to write and read :
dat[,.SD[any(value!=0)],by=id]
.SD
stands for Subset of Data. This answer explains .SD
very well.
Picking up on Gabor's nice use of ave
, but without repeating the same variable name (DF
) three times, which can be a source of typo bugs if you have a lot of long or similar variable names, try :
dat[ ave(value!=0,id,FUN=any) ]
The difference in speed between those two may be dependent on several factors including: i) number of groups ii) size of each group and iii) the number of columns in the real dat
.
这篇关于从数据框中删除只有零值的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!