时间序列数据的主成分分析(PCA):时空模式 [英] Principal component analysis (PCA) of time series data: spatial and temporal pattern

查看:2612
本文介绍了时间序列数据的主成分分析(PCA):时空模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有1951年至1980年的100个气象站的年降水量数据.在一些论文中,我发现人们将PCA应用于时间序列,然后绘制空间负荷图(值从-1到1),并绘制出PC的时间序列.例如, https://publicaciones.unirioja中的图6 .es/ojs/index.php/cig/article/view/2931/2696 是PC的空间分布.

Suppose I have yearly precipitation data for 100 stations from 1951 to 1980. In some papers, I find people apply PCA to the time series and then plot the spatial loadings map (with values from -1 to 1), and also plot the time series of the PCs. For example, figure 6 in https://publicaciones.unirioja.es/ojs/index.php/cig/article/view/2931/2696 is the spatial distribution of the PCs.

我在R中使用函数prcomp,我想知道如何做同样的事情.换句话说,如何从prcomp函数的结果中提取空间模式"和时间模式"?谢谢.

I am using function prcomp in R and I wonder how I can do the same thing. In other words, how can I extract the "spatial pattern" and "temporal pattern" from the results of prcomp function? Thanks.

set.seed(1234)
rainfall = sample(x=100:1000,size = 100*30,replace = T)
rainfall=matrix(rainfall,nrow=100)
colnames(rainfall)=1951:1980
PCA = prcomp(rainfall,retx=T)

或者,在 https://1drv.ms/u/s!AnVl_zW00EHegxAprS4s7PDaYQVr

推荐答案

时间模式"解释了所有网格中时间序列的主要时间变化,并由主要成分(PC,多个时间序列).在R中,对于最重要的PC PC1,它是prcomp(data)$x[,'PC1'].

"Temporal pattern" explains the dominant temporal variation of time series in all grids, and it is represented by principal components (PCs, a number of time series) of PCA. In R, it is prcomp(data)$x[,'PC1'] for the most important PC, PC1.

空间模式"解释了PC取决于某些变量(在您的情况下为地理位置)的强大程度,并由每个主要组件的负载表示.例如,对于PC1,它是prcomp(data)$rotation[,'PC1'].

"Spatial pattern" explains how strong the PCs depend on some variables (geography in your case), and it is represented by the loadings of each principal components. For example, for PC1, it is prcomp(data)$rotation[,'PC1'].

以下是为R中的时空数据构造PCA并使用您的数据显示时间变化和空间异质性的示例.

首先,必须将数据转换为具有变量(空间网格)和观测值(yyyy-mm)的data.frame.

First of all, the data has to be transformed into a data.frame with variables (spatial grid) and observations (yyyy-mm).

加载和转换数据:

load('spei03_df.rdata') 
str(spei03_df) # the time dimension is saved as names (in yyyy-mm format) in the list
lat <- spei03_df$lat # latitude of each values of data
lon <- spei03_df$lon # longitude
rainfall <- spei03_df 
rainfall$lat <- NULL
rainfall$lon <- NULL
date <- names(rainfall)
rainfall <- t(as.data.frame(rainfall)) # columns are where the values belong, rows are the times

要了解数据,请在地图上绘制1950年1月的数据:

To understand the data, drawing on map the data for Jan 1950:

library(mapdata)
library(ggplot2) # for map drawing

drawing <- function(data, map, lonlim = c(-180,180), latlim = c(-90,90)) {
  major.label.x = c("180", "150W", "120W", "90W", "60W", "30W", "0", 
                    "30E", "60E", "90E", "120E", "150E", "180")
  major.breaks.x <- seq(-180,180,by = 30)
  minor.breaks.x <- seq(-180,180,by = 10)

  major.label.y = c("90S","60S","30S","0","30N","60N","90N")
  major.breaks.y <- seq(-90,90,by = 30)
  minor.breaks.y <- seq(-90,90,by = 10)
  panel.expand <- c(0,0)

  drawing <- ggplot() + 
    geom_path(aes(x = long, y = lat, group = group), data = map) + 
    geom_tile(data = data, aes(x = lon, y = lat, fill = val), alpha = 0.3, height = 2) + 
    scale_fill_gradient(low = 'white', high = 'red') + 
    scale_x_continuous(breaks = major.breaks.x, minor_breaks = minor.breaks.x, labels = major.label.x, 
                       expand = panel.expand,limits = lonlim) +
    scale_y_continuous(breaks = major.breaks.y, minor_breaks = minor.breaks.y, labels = major.label.y,
                       expand = panel.expand, limits = latlim) +
    theme(panel.grid = element_blank(), panel.background = element_blank(),
          panel.border = element_rect(fill = NA, color = 'black'),
          axis.ticks.length = unit(3,"mm"),
          axis.title = element_text(size = 0),
          legend.key.height = unit(1.5,"cm"))

  return(drawing)
}

map.global <- fortify(map(fill=TRUE, plot=FALSE))
dat <- data.frame(lon = lon, lat = lat, val = rainfall["1950-01",])
sample_plot <- drawing(dat, map.global, lonlim = c(-180,180), c(-90,90))
ggsave("sample_plot.png", sample_plot,width = 6,height=4,units = "in",dpi = 600)

如上所示,提供的链接提供的网格数据包括代表加拿大降雨(某些指标?)的值.

As shown above, the gridded data given by the link provided includes values that represent rainfall (some kinds of indexes?) in Canada.

主成分分析:

PCArainfall <- prcomp(rainfall, scale = TRUE)
summaryPCArainfall <- summary(PCArainfall)
summaryPCArainfall$importance[,c(1,2)]

这表明前两个PC解释了降雨数据中10.5%和9.2%的方差.

It shows that the first two PCs explain 10.5% and 9.2% of variance in the rainfall data.

我提取了前两台PC的负载和PC时间序列本身: 空间格局"(负荷)显示趋势强度(PC1和PC2)的空间异质性.

I extract the loadings of the first two PCs and the PC time series themselves: The "spatial pattern" (loadings), showing the spatial heterogeneity of the strengths of the trends (PC1 and PC2).

loading.PC1 <- data.frame(lon=lon,lat=lat,val=PCArainfall$rotation[,'PC1'])
loading.PC2 <- data.frame(lon=lon,lat=lat,val=PCArainfall$rotation[,'PC2'])

drawing.loadingPC1 <- drawing(loading.PC1,map.global, lonlim = c(-180,-30), latlim = c(40,90)) + ggtitle("PC1")
drawing.loadingPC2 <- drawing(loading.PC2,map.global, lonlim = c(-180,-30), latlim = c(40,90)) + ggtitle("PC2")

ggsave("loading_PC1.png",drawing.loadingPC1,width = 6,height=4,units = "in",dpi = 600)
ggsave("loading_PC2.png",drawing.loadingPC2,width = 6,height=4,units = "in",dpi = 600)

时间模式",前两个PC时间序列,显示了数据的主要时间趋势

The "temporal pattern", the first two PC time series, showing the dominant temporal trends of the data

library(xts)
PC1 <- ts(PCArainfall$x[,'PC1'],start=c(1950,1),end=c(2014,12),frequency = 12)
PC2 <- ts(PCArainfall$x[,'PC2'],start=c(1950,1),end=c(2014,12),frequency = 12)
png("PC-ts.png",width = 6,height = 4,res = 600,units = "in")
plot(as.xts(PC1),major.format = "%Y-%b", type = 'l', ylim = c(-100, 100), main = "PC") # the black one is PC1
lines(as.xts(PC2),col='blue',type="l") # the blue one is PC2
dev.off()

但是,此示例绝不是适合您的数据的最佳PCA,因为PC1和PC2存在严重的季节性和年度变化(当然,夏季下雨更多,而查看PC的弱尾巴).

This example is, however, by no means the best PCA for your data because there are serious seasonality and annual variations in the PC1 and PC2 (Of course, it rains more in summer, and look at the weak tails of the PCs).

您可以通过对数据进行季节性淡化或通过回归消除年度趋势来改善PCA,如文献所建议的那样.但这已经超出了我们的主题.

You can improve the PCA probably by deseasonalizing the data or removing the annual trend by regression, as in the literature suggested. But this is already beyond our topic.

这篇关于时间序列数据的主成分分析(PCA):时空模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆