将具有因子的数据表的不规则时间序列转换为R中的规则时间序列 [英] convert a irregular time series of a data table with factors into a regular time series in R

查看:70
本文介绍了将具有因子的数据表的不规则时间序列转换为R中的规则时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将数据表的不规则时间序列转换为规则时间序列。我的数据看起来像这样

I am trying to convert a irregular time series of a data table into a regular time series. My data looks like this

library(data.table)
dtRes <- data.table(time  = c(0.1, 0.8, 1, 2.3, 2.4, 4.8, 4.9),
                    abst  = c(1, 1, 1, 0, 0, 3, 3),
                    farbe = as.factor(c("keine", "keine", "keine", "keine", "keine", "rot", "blau")),
                    gier  = c(2.5, 2.5, 2.5, 0, 0, 3, 3),
                    goff  = as.factor(c("haus", "maus", "toll", "maus", NA, "maus", "maus")),
                    huft  = as.factor(c(NA, NA, NA, "wolle", "wolle", "holz", "holz")),
                    mode  = c(4, 4, 4, 2.5, NA, 3, 3))

如何通过将块大小设为1来汇总观察值第二? (行数可变-如果在1秒的时间内没有行,则为0)结果应该是数值列(均值被省略)的平均值,如果有多个唯一行,则结果应为整个重复行的均值值。如果这对因素来说是不可能的,或者对您没有意义,那么也可以只取因子列中特定秒数的第一个值。这样,它将是真正的常规时间序列,没有任何重复的时间。如果某个时间间隔没有值(例如在第二秒的示例中),则结果为NA。

How is it possible to aggregate the observations in chunks by taking a chunk size of like 1 second? (with a variable number of rows - even 0 if there are no rows within a 1 second period) The result should be the mean for the numeric columns (NAs omitted) and for the factors a whole duplicated row if there is more than 1 unique value. If this is not possible for factors or doesn't make sense to you, it is also fine to just take the first value of the specific second in the factor column. This way it would be real regular time series without any duplicated times. If there is no value for an interval (like in the example for the 2nd second), the result is NA.

最后,结果看起来像这样(取决于

In the end the result can look like this (depends on duplicated rows or not):

重复项:

wiDups <- data.table(time  = c(1, 1, 2, 3, 4, 5, 5),
                     abst  = c(1, 1, NA, 1, NA, 5, 5),
                     farbe = as.factor(c("keine", "keine", NA, "keine", NA, "rot", "blau")),
                     gier  = c(2.5, 2.5, NA, 0, NA, 4.5, 4.5),
                     goff  = as.factor(c("haus", "maus", NA, "maus", NA, "maus", "maus")),
                     huft  = as.factor(c(NA, NA, NA, "wolle", NA, "holz", "holz")),
                     mode  = c(5, 5, NA, 2.5, NA, 4, 4))

且无重复:

noDups <- data.table(time  = c(1, 2, 3, 4, 5),
                     abst  = c(1, NA, 1, NA, 5),
                     farbe = as.factor(c("keine", NA, "keine", NA, "rot")),
                     gier  = c(2.5, NA, 0, NA, 4.5),
                     goff  = as.factor(c("haus", NA, "maus", NA, "maus")),
                     huft  = as.factor(c(NA, NA, "wolle", NA, "holz")),
                     mode  = c(5, NA, 2.5, NA, 4))

将它转换成时间序列对象更好吗?

Is it better to convert it into a time series object?

推荐答案

这是一个 data.table 答案:

时间向上舍入到最接近的秒数:

Rounding up time to the nearest second:

> dtRes[, 
+       lapply(.SD, function(z) {return(ifelse(is.factor(z), levels(z)[unique(z)[1]], mean(z, na.rm = T)))} ), 
+       by = .(time = round(time, digits = 0))]
   time abst farbe gier goff  huft mode
1:    0    1 keine  2.5 haus  <NA>  4.0
2:    1    1 keine  2.5 maus  <NA>  4.0
3:    2    0 keine  0.0 maus wolle  2.5
4:    5    3   rot  3.0 maus  holz  3.0

使用天花板函数:

> dtRes[, 
+       lapply(.SD, function(z) {return(ifelse(is.factor(z), levels(z)[unique(z)[1]], mean(z, na.rm = T)))} ), 
+       by = .(time = ceiling(time))]
   time abst farbe gier goff  huft mode
1:    1    1 keine  2.5 haus  <NA>  4.0
2:    3    0 keine  0.0 maus wolle  2.5
3:    5    3   rot  3.0 maus  holz  3.0

您可以根据需要调整返回级别的逻辑。在这里,我返回与第一个非唯一值相对应的 level 级别。

You can adjust the logic for returning the level based on what you want. Here I'm returning the level corresponding to the first non-unique value.

您可以在 ifelse as.numeric c>语句-具有相似的结果。我意识到 factor 列的数据类型更改为 character -如果需要 factor ,则可以在单独的语句中设置此值或使用链接。

You may switch to using as.numeric in the ifelse statement - with similar results. I realized the data type for factor columns changes to character - if you need factor then you can set this specifically in a separate statement or use chaining.

dtRes[, lapply(.SD, ....), by = .(....)][, lapply(.SD, as.factor(...)), .SDcols = .( columns you want as factors), ]

这篇关于将具有因子的数据表的不规则时间序列转换为R中的规则时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆