使用ggplot绘制大量时间序列.有可能加快速度吗? [英] Plotting large number of time series using ggplot. Is it possible to speed up?

查看:111
本文介绍了使用ggplot绘制大量时间序列.有可能加快速度吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理数千个气象时间序列数据(可从此处下载示例数据) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt

I am working with thousands of meteorological time series data (Sample data can be downloaded from here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt

在我的Linux Mint PC(64位,8GB RAM,双核2.6 GHz)上使用ggplot2绘制这些数据花费了很多时间.我想知道是否有一种加快速度或更好的方式绘制这些数据的方法?非常感谢您的任何建议!

Plotting these data using ggplot2 on my Linux Mint PC (64bit, 8GB RAM, Dual-core 2.6 GHz) took a lot of time. I'm wondering if there is a way to speed it up or a better way to plot these data? Thank you very much in advance for any suggestion!

这是我现在正在使用的代码

This is the code I'm using for now

##############################################################################
#### load required libraries
library(RCurl)
library(reshape2)
library(dplyr)
library(ggplot2)

##############################################################################
#### Read data from URL
dataURL <- "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
tmp <- getURL(dataURL)
df <- tbl_df(read.table(text = tmp, header = TRUE))
df

##############################################################################
#### Plot time series using ggplot2
# Melt the data by date first
df_melt <- melt(df, id = "date")
str(df_melt)

df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) +
  geom_point() +
  scale_colour_discrete("Station #") +
  xlab("Date") +
  ylab("Daily Precipitation [mm]") +
  ggtitle("Daily precipitation from 1915 to 2011") +
  theme(plot.title = element_text(size = 16, face = "bold", vjust = 2)) + # Change size & distance of the title
  theme(axis.text.x = element_text(angle = 0, size = 12, vjust = 0.5)) + # Change size of tick text
  theme(axis.text.y = element_text(angle = 0, size = 12, vjust = 0.5)) +
  theme( # Move x- & y-axis lables away from the axises
    axis.title.x = element_text(size = 14, color = "black", vjust = -0.35),
    axis.title.y = element_text(size = 14, color = "black", vjust = 0.35)) +
  theme(legend.title = element_text(colour = "chocolate", size = 14, face = "bold")) + # Change Legend text size
  guides(colour = guide_legend(override.aes = list(size = 4))) + # Change legend symbol size
  guides(fill = guide_legend(ncols = 2))
df_plot

推荐答案

部分问题要求一种更好的方式绘制这些数据".

Part of your question asks for a "better way to plot these data".

本着这种精神,您似乎有两个问题,首先,您希望沿x轴绘制> 35,000个点,正如一些评论所指出的那样,这将导致除非常大的像素以外的任何像素重叠,高分辨率显示器.其次,也是更重要的IMO,您正在尝试在同一图上绘制69个时间序列(站).在这种情况下,热图可能是更好的方法.

In that spirit, you seem to have two problems, First, you expect to plot >35,000 points along the x-axis, which, as some of the comments point out, will result in pixel overlap on anything but an extremely large, high resolution monitor. Second, and more important IMO, you are trying to plot 69 time series (stations) on the same plot. In this type of situation a heatmap might be a better approach.

library(data.table)
library(ggplot2)
library(reshape2)          # for melt(...)
library(RColorBrewer)      # for brewer.pal(...)
url <-  "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
dt  <- fread(url)
dt[,Year:=year(as.Date(date))]

dt.melt  <- melt(dt[,-1,with=F],id="Year",variable.name="Station")
dt.agg   <- dt.melt[,list(y=sum(value)),by=list(Year,Station)]
dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))]
ggplot(dt.agg,aes(x=Year,y=Station)) + 
  geom_tile(aes(fill=y)) +
  scale_fill_gradientn("Annual\nPrecip. [mm]",
                       colours=rev(brewer.pal(9,"Spectral")))+
  scale_x_continuous(expand=c(0,0))+
  coord_fixed()

请注意使用data.tables.您的数据集非常大(因为所有列; 35,000行并不那么大).在这种情况下,data.tables将大大加快处理速度,尤其是fread(...),它比基本R中的文本导入功能要快得多.

Note the use of data.tables. Your dataset is fairly large (because of all the columns; 35,000 rows is not all that large). In this situation data.tables will speed up processing substantially, especially fread(...) which is much faster than the text import functions in base R.

这篇关于使用ggplot绘制大量时间序列.有可能加快速度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆