如何执行不等频率的时间序列之间的相关性 [英] How to perform correlation between time-series of unequal frequencies

查看:31
本文介绍了如何执行不等频率的时间序列之间的相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我每分钟测量室温 36 分钟,同时每秒测量皮肤温度 32 次.我有 35 次重复的实验标记为 (ID).我需要能够查看相关性,但是样本大小不等.

数据:

我有一个 data.frame df1,每分钟测量一次室温,另一个 data.frame df2 每秒测量 32 次皮肤温度.我有 36 分钟的数据.此外,还有另一列名为 ID 的列显示了实验编号 (1-35),但我不知道如何在以下示例数据中表示这一点.所以从技术上讲,我正在寻找基于 ID 的每个 SkinTemp 与 RoomTemp 的相关性.

 df1 <- data.frame(房间温度 = norm(1*36),)df2 <- 数据.frame(skinTemp = norm(32*60*36),)

我尝试过:

Data <- data.frame(Y=c(df1,df2),变量 =factor(rep(c("RoomTemp", "SkinTemp"), times=c(length(df1),length(df2)))))cor(数据$Y~数据$变量)

但这似乎不起作用.

解决方案

滚动连接或插值可能有助于在测量 skinTemp 时输入 roomTemp.下面是两者的例子.第一部分是处理多个ID的更新,后面是针对单个ID的情况的原答案.

更新:处理多个 ID 的新版本

此更新解决了具有多个 ID 的数据的情况,我们希望为每个 ID 分别进行插值或滚动连接.

library(data.table)图书馆(重塑2)图书馆(dplyr)图书馆(咕噜咕噜)图书馆(ggplot2)主题集(主题经典(base_size=16))

首先,我们将为两个独立的 ID 创建虚假的自相关数据:

set.seed(395)df1 <- data.frame(roomTemp = c(cumsum(rnorm(1*36)), cumsum(rnorm(1*36))),ID = rep(c("A","B"), each=36))df2 <- data.frame(skinTemp = c(cumsum(rnorm(32*60*36,0,0.01)),cumsum(rnorm(32*60*36,0,0.01))),ID = rep(c("A","B"), each=32*60*36))

现在我们添加一个时间列,但在这种情况下,我还在 df1 中添加了一个偏移,以便没有 df1 测量与df2 测量,只是为了让答案更一般.

# 添加时间列df1$time = rep(0:(0.5*nrow(df1)-1)*60 + 0.0438,2)df2$time = rep(0:(0.5*nrow(df2)-1)/32, 2)

将数据框转换为数据表.这一次,除了time之外,我们还让ID成为一个键列,这样滚动连接将针对每个ID单独发生.

# 将数据框转换为数据表设置DT(df1)设置DT(df2)# 在两个数据框中制作 ID 和时间键列(用于加入)设置键(df1,ID,时间)设置键(df2,ID,时间)# 滚动连接 roomTemp 到最接近 skinTemp 的时间值df2 = df1[df2, roll="最近的"]# 重命名滚动加入室温列名称(df2)[grep(roomTemp",名称(df2))] =roomTempRoll"

要通过 ID 添加插入的 roomTemp,我使用了 purrr 包中的 map_df.map_df 分别对每个 ID 进行操作.approx 负责插值.在最初的答案中,我首先使用 approxfun 创建了一个近似函数,但在这里我只是直接在一个步骤中完成了插值.map_df 返回一个数据框,但我们只需要 y 列,它具有 roomTemp 的内插值,所以我在dplyr 函数链的末尾,并将它们分配给 df2 中的 roomTempInterp.

# 通过 ID 添加内插室温df2$roomTempInterp = 唯一(df2$ID)%>%map_df(~大约(df1$time[df1$ID==.x],df1$roomTemp[df1$ID==.x],xout=df2$time[df2$ID==.x]), .id="ID") %>% .$y

在下图中,我们按 ID 分面,以便我们可以分别查看每个 ID 的估算温度值.

# 绘制所以我们可以看到轧制加入了室温和# 内插室温看起来像ggplot(melt(df2, id.var=c("ID", "time")), aes(time, value, colour=variable)) +geom_line(size=0.7) +geom_point(data=df1, aes(time, roomTemp), colour="black") +facet_grid(ID ~ .)

这是通过 ID 获取相关性的一种方法:

df2 %>% group_by(ID) %>%总结(r_interp = cor(skinTemp, roomTempInterp, use="pairwise.complete.obs"),r_roll = cor(skinTemp, roomTempRoll, use="pairwise.complete.obs"))

<块引用>

 ID r_interp r_roll1 A -0.04853998 -0.029932072 乙 -0.53993960 -0.53092150

原答案

首先,我修改了示例数据框以添加一些自相关,因为这看起来更接近您的真实实验并且使可视化更容易.

library(data.table)图书馆(重塑2)图书馆(dplyr)图书馆(ggplot2)主题集(主题经典(base_size=16))# 具有自相关性的假数据set.seed(395)df1 <- data.frame(roomTemp = cumsum(rnorm(1*36)))df2 <- data.frame(skinTemp = cumsum(rnorm(32*60*36,0,0.01)))

现在添加一个时间列.您可以使用实际的日期时间列,但在这里我刚刚使用了以秒为单位的数字列.

# 添加时间列df1$time = 0:(nrow(df1)-1)*60df2$time = 0:(nrow(df2)-1)/32

对于插值,我们需要一个函数来在室温测量之间测量皮肤温度时对室温进行插值.approxfun 在点之间执行线性插值.您还可以以类似的方式使用 splinefun 来使用样条进行插值.

# 在测量之间插入室温的函数roomTempInterp = approxfun(df1$time, df1$roomTemp)

将数据框转换为数据表,以便使用data.table 的滚动连接功能.

# 将数据帧转换为数据表设置DT(df1)设置DT(df2)# 将时间作为两个数据框中的关键列(用于加入)设置键(df1,时间)设置键(df2,时间)

现在对最近的时间值执行滚动连接.

# 滚动连接 roomTemp 到最接近 skinTemp 的时间值df2 = df1[df2, roll="最近的"]# 重命名滚动加入室温列名称(df2)[grep(roomTemp",名称(df2))] =roomTempRoll"

将原始 roomTemp 测量值从 df1 合并到 df2.

df2 = df1[df2, ] # 等价于 dplyr: df2 = left_join(df2, df1)

使用我们上面创建的函数添加内插的室温.

# 添加内插室温df2$roomTempInterp = roomTempInterp(df2$time)

插值方法对我来说似乎更现实,特别是如果我们可以假设 roomTemp 在测量之间相对平滑和单调地变化.下面是 df2 的前 10 行,包括原始 df2 数据加上新的 roomTempRollroomTempInterp 列以及来自 df1 的原始 roomTemp 测量值.您现在可以使用此数据框来评估 roomTempskinTemp 之间的相关性和其他关系.

<块引用>

 roomTemp time roomTempRoll skinTemp roomTempInterp1:-1.21529 0.00000 -1.21529 -0.006511475 -1.2152902:不适用 0.03125 -1.21529 -0.014058076 -1.2155313:不适用 0.06250 -1.21529 -0.017741690 -1.2157734:不适用 0.09375 -1.21529 -0.030211177 -1.2160145:不适用 0.12500 -1.21529 -0.027105225 -1.2162556:不适用 0.15625 -1.21529 -0.035784295 -1.2164977:不适用 0.18750 -1.21529 -0.031319748 -1.2167388:不适用 0.21875 -1.21529 -0.033758959 -1.2169799:不适用 0.25000 -1.21529 -0.040667384 -1.21722010:不适用 0.28125 -1.21529 -0.026291442 -1.217462

下面是一个图,您可以看到滚动连接和插值的样子.黑点标记原始 roomTemp 测量值.

ggplot(melt(df2 %>% select(-roomTemp), id.var="time"), aes(time, value, colour=variable)) +geom_line(size=1) +geom_point(data=df2, aes(time, roomTemp), colour="black")

I measured room temperature every minute for 36 minutes and skin temperature 32 times per second for the same time period. I have 35 repeats of the experiment labelled (ID). I need to be able to look at the correlation but, the samples are of unequal sizes.

Data:

I have a data.frame df1 with room temperature measured every minute and another data.frame df2 with skin temperature measured 32 times per second. I have 36 minutes worth of data. In addition there is another column called ID which shows the experiment number (1-35) but I don't know how to represent this in the following example data. So technically I'm looking for correlation for each SkinTemp vs RoomTemp based on ID.

    df1 <- data.frame(
        roomTemp = rnorm(1*36),
    )

   df2 <- data.frame(
        skinTemp = rnorm(32*60*36),
        )

I tried doing:

Data <- data.frame(
  Y=c(df1,df2),
  Variable =factor(rep(c("RoomTemp", "SkinTemp"), times=c(length(df1), length(df2))))
)

cor(Data$Y~Data$Variable)

but that doesn't seem to work.

解决方案

A rolling join or interpolation might be helpful for imputing roomTemp for times when skinTemp was measured. Below are examples of both. The first section is an update to deal with multiple IDs, followed by the original answer for the case of a single ID.

UPDATE: New version to deal with multiple IDs

This update addresses the case of data with multiple IDs where we want to either interpolate or do a rolling join separately for each ID.

library(data.table)
library(reshape2)
library(dplyr)
library(purrr)
library(ggplot2)
theme_set(theme_classic(base_size=16))

First, we'll create fake autocorrelated data for two separate IDs:

set.seed(395)
df1 <- data.frame(roomTemp = c(cumsum(rnorm(1*36)), cumsum(rnorm(1*36))),
                  ID = rep(c("A","B"), each=36))
df2 <- data.frame(skinTemp = c(cumsum(rnorm(32*60*36,0,0.01)),
                               cumsum(rnorm(32*60*36,0,0.01))),
                  ID = rep(c("A","B"), each=32*60*36))

Now we add a time column, but in this case I've also added a shift in df1, so that no df1 measurement happens at the same time as a df2 measurement, just to make the answer more general.

# Add time column
df1$time = rep(0:(0.5*nrow(df1)-1)*60 + 0.0438,2)
df2$time = rep(0:(0.5*nrow(df2)-1)/32, 2)

Convert the data frames to data tables. This time, we make ID a key column in addition to time so that the rolling join will occur separately for each ID.

# Convert data frames to data tables
setDT(df1)
setDT(df2)

# Make ID and time key columns in both data frames (for joining)
setkey(df1, ID, time)
setkey(df2, ID, time)

# Rolling join roomTemp to nearest time value of skinTemp
df2 = df1[df2, roll="nearest"]

# Rename rolling joined room temperature column
names(df2)[grep("roomTemp", names(df2))] = "roomTempRoll"

To add the interpolated roomTemp by ID, I've used map_df from the purrr package. map_df operates separately on each ID. approx takes care of the interpolation. In the original answer I used approxfun to create an approximation function first, but here I've just done the interpolation directly in a single step. map_df returns a data frame, but we just need the y column, which has the interpolated values of roomTemp, so I've extracted those at the end of the dplyr function chain and assigned them to roomTempInterp in df2.

# Add interpolated room temperature by ID
df2$roomTempInterp = unique(df2$ID) %>% 
  map_df(~ approx(df1$time[df1$ID==.x], df1$roomTemp[df1$ID==.x], 
                  xout=df2$time[df2$ID==.x]), .id="ID") %>% .$y

In the plot below, we facet by ID so that we can see the imputed temperature values separately for each ID.

# Plot so we can see what the rolling joined room temperature and 
#  interpolated room temperature look like
ggplot(melt(df2, id.var=c("ID", "time")), aes(time, value, colour=variable)) +
  geom_line(size=0.7) +
  geom_point(data=df1, aes(time, roomTemp), colour="black") +
  facet_grid(ID ~ .)

Here's one way to get the correlations by ID:

df2 %>% group_by(ID) %>%
  summarise(r_interp = cor(skinTemp, roomTempInterp, use="pairwise.complete.obs"),
            r_roll = cor(skinTemp, roomTempRoll, use="pairwise.complete.obs"))

      ID    r_interp      r_roll
1      A -0.04853998 -0.02993207
2      B -0.53993960 -0.53092150

Original Answer

First, I modified the sample data frames to add some autocorrelation, since that seemed a bit closer to your real experiment and makes visualization easier.

library(data.table)
library(reshape2)
library(dplyr)
library(ggplot2)
theme_set(theme_classic(base_size=16))

# Fake data with autocorrelation
set.seed(395)
df1 <- data.frame(roomTemp = cumsum(rnorm(1*36)))
df2 <- data.frame(skinTemp = cumsum(rnorm(32*60*36,0,0.01)))

Now add a time column. You can work with actual datetime columns, but here I've just gone with numeric columns denominated in seconds.

# Add time column
df1$time = 0:(nrow(df1)-1)*60
df2$time = 0:(nrow(df2)-1)/32

For interpolation, we need a function that will interpolate room temperatures at the times when skin temperature is measured in between the room temperature measurements. approxfun performs linear interpolation between points. You can also use splinefun in a similar way to interpolate using splines.

# Function to interpolate room temperature between measurements
roomTempInterp = approxfun(df1$time, df1$roomTemp)

Convert the data frames to data tables in order to use data.table's rolling join functionality.

# Convert data frames to data tables
setDT(df1)
setDT(df2)

# Make time a key column in both data frames (for joining)
setkey(df1, time)
setkey(df2, time)

Now perform a rolling join to the nearest time value.

# Rolling join roomTemp to nearest time value of skinTemp
df2 = df1[df2, roll="nearest"]

# Rename rolling joined room temperature column
names(df2)[grep("roomTemp", names(df2))] = "roomTempRoll"

Merge original roomTemp measurements from df1 into df2.

df2 = df1[df2, ]  # Equivalent to dplyr: df2 = left_join(df2, df1)

Add the interpolated room temperature using the function we created above.

# Add interpolated room temperature
df2$roomTempInterp = roomTempInterp(df2$time)

The interpolation method seems more realistic to me, especially if we can assume roomTemp changes relatively smoothly and monotonically between measurements. Below are the first 10 rows of df2, which includes the original df2 data plus the new roomTempRoll and roomTempInterp columns and the original roomTemp measurements from df1. You can now use this data frame to assess correlation and other relationships between roomTemp and skinTemp.

    roomTemp    time roomTempRoll     skinTemp roomTempInterp
 1: -1.21529 0.00000     -1.21529 -0.006511475      -1.215290
 2:       NA 0.03125     -1.21529 -0.014058076      -1.215531
 3:       NA 0.06250     -1.21529 -0.017741690      -1.215773
 4:       NA 0.09375     -1.21529 -0.030211177      -1.216014
 5:       NA 0.12500     -1.21529 -0.027105225      -1.216255
 6:       NA 0.15625     -1.21529 -0.035784295      -1.216497
 7:       NA 0.18750     -1.21529 -0.031319748      -1.216738
 8:       NA 0.21875     -1.21529 -0.033758959      -1.216979
 9:       NA 0.25000     -1.21529 -0.040667384      -1.217220
10:       NA 0.28125     -1.21529 -0.026291442      -1.217462

Below is a plot so you can see what the rolling join and interpolated values look like. The black dots mark the original roomTemp measurements.

ggplot(melt(df2 %>% select(-roomTemp), id.var="time"), aes(time, value, colour=variable)) +
  geom_line(size=1) +
  geom_point(data=df2, aes(time, roomTemp), colour="black")

这篇关于如何执行不等频率的时间序列之间的相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆