ggplot:绘制缺少值的时间序列数据 [英] ggplot: Plotting timeseries data with missing values

查看:129
本文介绍了ggplot:绘制缺少值的时间序列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直试图在我创建的数据框的两列之间绘制图形.存储在第一列中的数据值是名为时间"的日间数据(格式为YYYY-MM-DD),第二列包含降水量,这是一个名为"data1"的数值.

I have been trying to plot a graph between two columns from a data frame which I had created. The data values stored in the first column is daily time data named "Time"(format- YYYY-MM-DD) and the second column contains precipitation magnitude, which is a numeric value named "data1".

此数据取自excel文件"St Lucia3",该文件总共有 11598 个数据点,并将1981年至2018年的每日降水数据存储在两列中:

This data is taken from an excel file "St Lucia3" which has a total 11598 data points and stores daily precipitation data from 1981 to 2018 in two columns:

1)YearMonthDay(格式为"YYYYMMDD",例如"19810501")

1) YearMonthDay (format- "YYYYMMDD", example "19810501")

2)雨量(mm)

将数据导入R的代码:

StLucia <- read_excel("C:/Users/hp/Desktop/St Lucia3.xlsx")

时间数据时间"的代码:

The code for time data "Time" :

Time <- as.Date(as.character(StLucia$YearMonthDay), format= "%Y%m%d")

降水数据"data1"的代码:

The code for precipitation data "data1" :

data1 <- na.ma(StLucia$`Rainfall (mm)`, k = 4, weighting = "exponential")

数据框"Pecip1"的代码:

The code for data frame "Pecip1" :

Precip1 <- data.frame(Time, data1, check.rows=TRUE)

ggplot的代码是:

The code for ggplot is:

ggplot(data = Precip1, mapping= aes(x= Time, y= data1)) + geom_line()

使用ggplot在时间"和"data1"之间绘制图形,结果如下:

Using ggplot for plotting the graph between "Time" and "data1" results as:

有人可以向我解释一下,即使在"data1"列中没有这样的值,为什么在图形的右端有一个异常扭结"之类的行为.

Can someone please explain to me why there is an "unusual kink" like behavior at the right end of the graph, even though there are no such values in the column "data1".

"data1"数据相对于其索引的绘图如下所示:

The plot of "data1" data against its index is as shown:

此图的代码为:

plot(data1, type = "l")

任何帮助将不胜感激.谢谢!

Any help would be highly appreciated. Thanks!

推荐答案

通过使用pad,我们可以弥补这些丢失的值,并为其分配一个NA值. 避免在丢失的数据区域进行绘图.

By using pad we can make up for those lost values an assign an NA value as to avoid plotting in the region of missing data.

library(padr)
library(zoo)

YearMonthDay<-c(19810501,19810502,19810504,19810505)
Data<-c(1,2,3,4)

StLucia<-data.frame(YearMonthDay,Data)

 StLucia$YearMonthDay <- as.Date(as.character(StLucia$YearMonthDay), format= 
 "%Y%m%d")

> StLucia
  YearMonthDay Data
1   1981-05-01    1
2   1981-05-02    2
3   1981-05-04    3
4   1981-05-05    4

注意:您可以看到我们缺少日期,但是位置2和3之间仍然没有间隙,因此绘制索引与索引不会出现间隙.

Note: you can see we are missing a date, but still there is no gap between position 2 and 3, thus plotting versus indexing you would not see a gap.

因此,让我们添加缺少的日期:

So lets add the missing date:

 StLucia<-pad(StLucia,interval="day")

> StLucia
   YearMonthDay Data
 1   1981-05-01    1
 2   1981-05-02    2
 3   1981-05-03   NA
 4   1981-05-04    3
 5   1981-05-05    4

 plot(StLucia, type = "l")

如果要填写这些NA值,请使用package(zoo)中的na.locf()

If you want to fill in those NA values, use na.locf() from package(zoo)

这篇关于ggplot:绘制缺少值的时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆