使用 ggplot 绘制大量时间序列.有没有可能加快速度? [英] Plotting large number of time series using ggplot. Is it possible to speed up?
问题描述
我正在处理数千个气象时间序列数据(样本数据可以从这里下载)https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt
I am working with thousands of meteorological time series data (Sample data can be downloaded from here) https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt
在我的 Linux Mint PC(64 位,8GB RAM,双核 2.6 GHz)上使用 ggplot2 绘制这些数据需要很长时间.我想知道是否有一种方法可以加快速度或更好的方法来绘制这些数据?非常感谢您的任何建议!
Plotting these data using ggplot2 on my Linux Mint PC (64bit, 8GB RAM, Dual-core 2.6 GHz) took a lot of time. I'm wondering if there is a way to speed it up or a better way to plot these data? Thank you very much in advance for any suggestion!
这是我现在使用的代码
##############################################################################
#### load required libraries
library(RCurl)
library(reshape2)
library(dplyr)
library(ggplot2)
##############################################################################
#### Read data from URL
dataURL <- "https://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
tmp <- getURL(dataURL)
df <- tbl_df(read.table(text = tmp, header = TRUE))
df
##############################################################################
#### Plot time series using ggplot2
# Melt the data by date first
df_melt <- melt(df, id = "date")
str(df_melt)
df_plot <- ggplot(data = df_melt, aes(x = date, y = value, color = variable)) +
geom_point() +
scale_colour_discrete("Station #") +
xlab("Date") +
ylab("Daily Precipitation [mm]") +
ggtitle("Daily precipitation from 1915 to 2011") +
theme(plot.title = element_text(size = 16, face = "bold", vjust = 2)) + # Change size & distance of the title
theme(axis.text.x = element_text(angle = 0, size = 12, vjust = 0.5)) + # Change size of tick text
theme(axis.text.y = element_text(angle = 0, size = 12, vjust = 0.5)) +
theme( # Move x- & y-axis lables away from the axises
axis.title.x = element_text(size = 14, color = "black", vjust = -0.35),
axis.title.y = element_text(size = 14, color = "black", vjust = 0.35)) +
theme(legend.title = element_text(colour = "chocolate", size = 14, face = "bold")) + # Change Legend text size
guides(colour = guide_legend(override.aes = list(size = 4))) + # Change legend symbol size
guides(fill = guide_legend(ncols = 2))
df_plot
推荐答案
您的部分问题要求更好地绘制这些数据".
Part of your question asks for a "better way to plot these data".
本着这种精神,您似乎有两个问题,首先,您希望沿 x 轴绘制 >35,000 个点,正如一些评论指出的那样,这将导致像素重叠,而不是非常大, 高分辨率显示器.其次,也是更重要的 IMO,您正试图在同一个图上绘制 69 个时间序列(站).在这种情况下,热图可能是更好的方法.
In that spirit, you seem to have two problems, First, you expect to plot >35,000 points along the x-axis, which, as some of the comments point out, will result in pixel overlap on anything but an extremely large, high resolution monitor. Second, and more important IMO, you are trying to plot 69 time series (stations) on the same plot. In this type of situation a heatmap might be a better approach.
library(data.table)
library(ggplot2)
library(reshape2) # for melt(...)
library(RColorBrewer) # for brewer.pal(...)
url <- "http://dl.dropboxusercontent.com/s/bxioonfzqa4np6y/timeSeries.txt"
dt <- fread(url)
dt[,Year:=year(as.Date(date))]
dt.melt <- melt(dt[,-1,with=F],id="Year",variable.name="Station")
dt.agg <- dt.melt[,list(y=sum(value)),by=list(Year,Station)]
dt.agg[,Station:=factor(Station,levels=rev(levels(Station)))]
ggplot(dt.agg,aes(x=Year,y=Station)) +
geom_tile(aes(fill=y)) +
scale_fill_gradientn("Annual
Precip. [mm]",
colours=rev(brewer.pal(9,"Spectral")))+
scale_x_continuous(expand=c(0,0))+
coord_fixed()
注意 data.tables
的使用.您的数据集相当大(因为所有列;35,000 行并不是那么大).在这种情况下,data.tables
将大大加快处理速度,尤其是 fread(...)
,它比基础 R 中的文本导入函数快得多.
Note the use of data.tables
. Your dataset is fairly large (because of all the columns; 35,000 rows is not all that large). In this situation data.tables
will speed up processing substantially, especially fread(...)
which is much faster than the text import functions in base R.
这篇关于使用 ggplot 绘制大量时间序列.有没有可能加快速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!