通过该序列 r 识别数据和子集中的时间序列 [英] identify time sequence in data and subset by that sequence r

查看:32
本文介绍了通过该序列 r 识别数据和子集中的时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个代码,用于识别 R 中重复时间序列的长度(以秒为单位),并将每个序列子集到其自己的数据框中以进行曲线拟合和分析.每个序列都是传感器电压输出的时间序列,必须单独分析.

I am trying to write a code that identifies the length of a repeating time sequence (in seconds) in R and subsets each sequence into its own data frame for curve fitting and analysis. Each sequence is a time series of sensor voltage output and has to be analyzed separately.

我的代码看起来很笨拙,但它可以像这里写的那样工作.我试图弄清楚是否有一个包或简单的步骤是我为了更优雅地做到这一点而遗漏的.秒是十进制秒,数据可以是数字或整数,这个例子无关紧要.这不是实际的传感器输出,而是相同的格式.

My code seems clunky, but it works as it is written here. I am trying to figure out if there was a package or easy step that I was missing for doing this more elegantly. The seconds are decimal seconds and the data could be numeric, or integer, it doesn't matter for this example. This is not the actual sensor output, but the same format.

set.seed(1)
all_data = data.frame( sec = rep(1.8:4,9), data = sample(1:27), data2 = sample(5:7))

#identify time step length in seconds
lowest = min(all_data$sec)
highest = max(all_data$sec)
#put into data frame
time_step = c(lowest,highest)

#find index of first time period
matches = match(time_step,all_data[,1])
#subset first time period
total_measures = nrow(all_data)/matches[2]
all_data = all_data[matches[1]:nrow(all_data),]
# test_frame = data.frame(c(1,2))
n = matches[2]

#counter for number of measures in file
count = c(1:(nrow(all_data)/n))
count2 = c(0:(nrow(all_data)/n-1))
# subset to break each measure into its own workable file
eq = paste("subd",count," = all_data[((",count2,"*n)+1):(",count,"*n),]",sep = "")
eval(parse(text = eq))

谢谢!

推荐答案

我会使用 data.table 为每个子集提供行 ID.

I would use data.table to give the rows id's for each subset.

require(data.table)
dt <- data.table(all_data)
dt[which.min(sec):nrow(dt), id:=1:.N, by=sec]

然后你可以像以前一样继续拆分:

Then you can continue to split as you did:

count <- 1:dt[, max(id, na.rm=TRUE)]
eq = paste("subd", count," = data.frame(dt[id==", count, ",list(sec, data, data2)])", sep = "")
eval(parse(text = eq))

或者,在 R 中更常见的是,您可以使用 split 拆分为子集.这将返回 data.frameslist.这非常有用,因为您可以使用 lapply 同时评估所有 data.frames 上的函数(曲线拟合等).

Alternatively, and more common in R, you can use split to split into subsets. This will return a list of data.frames. That's very useful, since you can then use lapply to evaluate a function (curve fitting, etc.) on all data.frames simultaneously.

split(data.frame(dt[, list(sec, data, data2)]), dt$id)

这篇关于通过该序列 r 识别数据和子集中的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆