使用 ggplot 绘制大量时间序列.有没有可能加快速度? [英] Plotting large number of time series using ggplot. Is it possible to speed up?

查看:14
本文介绍了使用 ggplot 绘制大量时间序列.有没有可能加快速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理数千个气象时间序列数据(样本数据可以从这里下载)https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt

I am working with thousands of meteorological time series data (Sample data can be downloaded from here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt

在我的 Linux Mint PC(64 位,8GB RAM,双核 2.6 GHz)上使用 ggplot2 绘制这些数据需要很长时间.我想知道是否有一种方法可以加快速度或更好的方法来绘制这些数据?非常感谢您的任何建议!

Plotting these data using ggplot2 on my Linux Mint PC (64bit, 8GB RAM, Dual-core 2.6 GHz) took a lot of time. I'm wondering if there is a way to speed it up or a better way to plot these data? Thank you very much in advance for any suggestion!

这是我现在使用的代码

##############################################################################
#### load required libraries
library(RCurl)
library(reshape2)
library(dplyr)
library(ggplot2)

##############################################################################
#### Read data from URL
dataURL <- "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
tmp <- getURL(dataURL)
df <- tbl_df(read.table(text = tmp, header = TRUE))
df

##############################################################################
#### Plot time series using ggplot2
# Melt the data by date first
df_melt <- melt(df, id = "date")
str(df_melt)

df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) +
  geom_point() +
  scale_colour_discrete("Station #") +
  xlab("Date") +
  ylab("Daily Precipitation [mm]") +
  ggtitle("Daily precipitation from 1915 to 2011") +
  theme(plot.title = element_text(size = 16, face = "bold", vjust = 2)) + # Change size & distance of the title
  theme(axis.text.x = element_text(angle = 0, size = 12, vjust = 0.5)) + # Change size of tick text
  theme(axis.text.y = element_text(angle = 0, size = 12, vjust = 0.5)) +
  theme( # Move x- & y-axis lables away from the axises
    axis.title.x = element_text(size = 14, color = "black", vjust = -0.35),
    axis.title.y = element_text(size = 14, color = "black", vjust = 0.35)) +
  theme(legend.title = element_text(colour = "chocolate", size = 14, face = "bold")) + # Change Legend text size
  guides(colour = guide_legend(override.aes = list(size = 4))) + # Change legend symbol size
  guides(fill = guide_legend(ncols = 2))
df_plot

推荐答案

您的部分问题要求更好地绘制这些数据".

Part of your question asks for a "better way to plot these data".

本着这种精神,您似乎有两个问题,首先,您希望沿 x 轴绘制 >35,000 个点,正如一些评论指出的那样,这将导致像素重叠,而不是非常大, 高分辨率显示器.其次,也是更重要的 IMO,您正试图在同一个图上绘制 69 个时间序列(站).在这种情况下,热图可能是更好的方法.

In that spirit, you seem to have two problems, First, you expect to plot >35,000 points along the x-axis, which, as some of the comments point out, will result in pixel overlap on anything but an extremely large, high resolution monitor. Second, and more important IMO, you are trying to plot 69 time series (stations) on the same plot. In this type of situation a heatmap might be a better approach.

library(data.table)
library(ggplot2)
library(reshape2)          # for melt(...)
library(RColorBrewer)      # for brewer.pal(...)
url <-  "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
dt  <- fread(url)
dt[,Year:=year(as.Date(date))]

dt.melt  <- melt(dt[,-1,with=F],id="Year",variable.name="Station")
dt.agg   <- dt.melt[,list(y=sum(value)),by=list(Year,Station)]
dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))]
ggplot(dt.agg,aes(x=Year,y=Station)) + 
  geom_tile(aes(fill=y)) +
  scale_fill_gradientn("Annual
Precip. [mm]",
                       colours=rev(brewer.pal(9,"Spectral")))+
  scale_x_continuous(expand=c(0,0))+
  coord_fixed()

注意 data.tables 的使用.您的数据集相当大(因为所有列;35,000 行并不是那么大).在这种情况下,data.tables 将大大加快处理速度,尤其是 fread(...),它比基础 R 中的文本导入函数快得多.

Note the use of data.tables. Your dataset is fairly large (because of all the columns; 35,000 rows is not all that large). In this situation data.tables will speed up processing substantially, especially fread(...) which is much faster than the text import functions in base R.

这篇关于使用 ggplot 绘制大量时间序列.有没有可能加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆