as.date在数据集中创建一些NA [英] as.date creates some NAs in dataset

查看:110
本文介绍了as.date在数据集中创建一些NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的小数据集:

I have a simple little dataset:

> str(SFdischg)
'data.frame':   11932 obs. of  4 variables:
 $ date: Factor w/ 11932 levels "1/01/1985","1/01/1986",..: 97 4409 8697 9677 10069 10461 10853 11245 11637 489 ...
 $ ddmm: Factor w/ 366 levels "01-Apr","01-Aug",..: 1 13 25 37 49 61 73 85 97 109 ...
 $ year: int  1984 1984 1984 1984 1984 1984 1984 1984 1984 1984 ...
 $ cfs : int  1500 1430 1500 1850 1810 1830 1850 1880 1970 1980 ...

我希望有一列日期,以便可以绘制时间数据:

I would like to have a column of dates so that I can plot temporal data:

SFdischg$daymo <- as.Date(SFdischg$ddmm, format="%d-%b")
> summary(SFdischg)
    date            ddmm            year           cfs           daymo           
 1/01/1985:    1   01-Apr :   33   Min.   :1984   Min.   : 172   Min.   :2018-01-01  
 1/01/1986:    1   01-Aug :   33   1st Qu.:1992   1st Qu.: 705   1st Qu.:2018-04-04  
 1/01/1987:    1   01-Jul :   33   Median :2000   Median : 948   Median :2018-07-03  
 1/01/1988:    1   01-Jun :   33   Mean   :2000   Mean   :1374   Mean   :2018-07-02  
 1/01/1989:    1   01-May :   33   3rd Qu.:2008   3rd Qu.:1340   3rd Qu.:2018-10-01  
 1/01/1990:    1   01-Nov :   33   Max.   :2016   Max.   :8100   Max.   :2018-12-31  
 (Other)  :11926   (Other):11734                                 NA's   :8           

但是,daymo现在有8个NA,我不知道为什么(这使绘制变得困难!).当ddmm中没有缺失数据时,少数NA来自何处?我该如何避免它们?我缺少明显的东西吗?

However, daymo now has 8 NAs and I can't understand why (and it makes it difficult to plot!). Where does the handful of NAs come from when there is no missing data in ddmm? How can I avoid them? Am I missing something obvious?

推荐答案

我的猜测是,您在ddmm列中拥有的某些因子数据无法正确解析为日期.您可以使用以下方法揭示这些错误的值:

My guess is that some of the factor data you have in the ddmm column cannot be parsed correctly into a date. You may reveal these bad values using:

SFdischg$ddmm[is.na(as.Date(SFdischg$ddmm, format="%d-%b"))]

请注意,由于ddmm列中没有年份组成部分,因此R似乎会自动将当前年份2018分配给该日期.理想情况下,您应该使用包含一年的来源信息来建立日期.

Note that since there is no year component in the ddmm column, R appears to be automatically assigning the current year 2018 to the date. Ideally, you should be building your date using source information which includes a year.

编辑:根据您在下面的评论,有问题的行的日期为19-Feb.这意味着这些日期甚至可能都不是从2018年开始的,这不是not年,而其2月只有28天.这说明了在解析日期(包括年份)时使用全套信息的重要性.

Based on your comment below, the offending rows had 19-Feb as the date. This implies that these dates were perhaps not even from 2018, which was not a leap year, and whose February had only 28 days. This illustrates the importance of working with a full set of information when parsing the date, including the year.

这篇关于as.date在数据集中创建一些NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆