R表示NA,尽管值存在 [英] R shows NA although a value is present

查看:102
本文介绍了R表示NA,尽管值存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两列PosixLT次,没有NA值,但NA值显示在检查

 > sum(is.na(check $ start))
[1] 19
> sum(is.na(check $ end))
[1] 23

数据存在于细胞中,为什么会发生这种情况?我听说这可能发生在PosixLT,但即使我将其转换为posixCT,也有非常奇怪的行为。如何解决这个问题?

 > as.POSIXct(check $ start,format =%Y-%m-%d%H:%M:%S,tz =CST6CDT)
[1] NA2014-03-09 01 :35:01 CSTNA2014-03-09 01:53:30 CSTNA
[6] NA NA NA NA2014-03-09 04:17:11 CDT
[ 11] NA NA2015-03-08 01:54:43 CSTNA NA
[16] NA NA NA NA NA
[21] NA NA NA


> dput(check)
structure(list(start = structure(list(sec = c(24,1,27,30,
8,21,40,9,43,11,31, 43,55,39,54,41,19,2,35,
6,54,40),min = c(45L,35L,14L,53L,36L,37L,47L,48L,54L,
17L,57L,53L,54L,3L,52L,22L,34L,28L,41L,42L,52L,52L,
53L),小时= c(2L,1L,2L,1L,2L, 2L,2L,2L,2L,4L,2L,2L,
1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L),mday = c(9L,9L,
9L,9L,9L,9L,9L,9L,9L,9L,9L,8L,8L,8L,8L,8L,8L,8L,
8L,8L,8L,8L,8L) 2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L, 2L,2L),
year = c(114L,114L,114L,114L,114L,114L,114L,114L,
114L,114L,114L,115L,115L,115L,115L,115L,115L ,115L,115L,115L,115L,115L),wday = c(0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L, 0L,0L,0L,0L,0L,0L,0L,0L,
0L,0L,0L),yday = c(67L,67L,67L,67L,67L,67L,67L,
67L ,67L,67L,67L, 66L,66L,66L,66L,66L,66L,66L,66L,
66L,66L,66L,66L),isdst = c(-1L,0L,-1L,0L,-1L,-1L,
-1L,-1L,-1L,1L,-1L,-1L,0L,-1L,-1L,-1L,-1L,-1L,
-1L,-1L,-1L, -1L,-1L),zone = c(,CST,,CST,
,,,,,CDT ,,,),gmtoff = c(NA_integer_,NA_integer_),,CST,,, ,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_ ,NA_integer_,NA_integer_,NA_integer_,
NA_integer_)),.Names = c(sec,min,hour,mday,mon,
year,wday ,yday,isdst,zone,gmtoff),class = c(POSIXlt,
POSIXt),tzone = c(CST6CDT,CST )),end = structure(list(
sec = c(7,59,38,45,29,46,39,14,52,29,37,5,23,
41, 10,43 ,46,46,53,24,57,13,51),min = c(55L,47L,
30L,2L,43L,51L,53L,56L,54L,54L,57L,56L,6L, 3L,
13L,29L,37L,32L,48L,47L,55L,55L,55L),小时= c(2L,
2L,2L,2L,2L,2L,2L,2L,2L ,2L,2L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L,2L),mday = c(9L,9L,9L,9L,9L,
9L,9L,9L,9L,9L,9L,8L,8L,8L,8L,8L,8L,8L,8L,8L,
8L,8L,8L),mon = c(2L,2L ,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L, $ b year = c(114L,114L,114L,114L,114L,114L,114L,114L,
114L,114L,114L,115L,115L,115L,115L,115L,115L,115L,
115L,115L,115L,115L,115L),wday = c(0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L,0L,0L,0L, ,0L,0L,0L,0L),yday = c(67L,67L,67L,67L,67L,67L,67L,
67L,67L,67L,67L, 66L,66L,66L,66L,66L,66L,66L,66L,
66L,66L,66L,66L),isdst = c(-1L,-1L,-1L, ,-1L,-1L,
-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,
-1L,-1L,-1L,-1L,-1L),zone = c(,,,,,,
, ,,,,,,,,,,, = c(NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_ ,n=c(sec,min,小时,mday,mon,年
wday,yday,isdst,zone,gmtoff),class = c(POSIXlt,
POSIXt),tzone = c(CST6CDT CST,CDT))),.Names = c(start,
end),row.names = c(1559963L,1560092L,1560157L,1560220L,
1560240L,1560247L ,1560252L,1560253L,1560255L,1560258L,1560260L,
2004432L,2004583L,2004591L,2004594L,200459 6L,2004598L,2004599L,
2004600L,2004603L,2004609L,2004610L,2004611L),class =data.frame)


解决方案

在这种情况下, is.na 的作用如何?

 > is.na.POSIXlt 
function(x)
is.na(as.POSIXct(x))
< bytecode:0x0000000014232980>

as.POSIXct 在这里表现如何?

 > as.POSIXct(check $ start)
[1] NA2014-03-09 01:35:01 CSTNA2014-03-09 01:53:30 CST
[5] NA NA NA NA
[9] NA2014-03-09 04:17:11 CDTNA NA
[13]2015-03-08 01:54:43 CSTNA NA NA
[17] NA NA NA NA
[21] NA NA NA

好的,但是为什么?



我们来检查 as.POSIXct的文档


需要在两个日期时间类之间进行的任何转换
需要一个时区:从POSIXlt到POSIXct的转换将
验证次数在选定的时区。一个问题是在
转换到和从DST之间发生的情况,例如在英国


我们来看看:

 >检查$ start $ zone 
[1]CSTCSTCST
[21]

龙,没有时区,除了4个条目,所以 as.POSIXct 无法确定日期是否有效(在DST更改或不?),您可以看到:

 > check $ start $ isdst 
[1] -1 0 -1 0 -1 -1 -1 -1 -1 1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

所以POSIXlt(您的数据帧)与POSIXct之间的转换无法猜测日期有效,并返回NA。



一种修复方法可能是对所有记录执行时区:

 >检查$ start<  -  as.POSIXlt(strftime(check $ start,tz =CST),tz =CST6CDT)
> is.na(check $ start)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
/ pre>

I have two columns of PosixLT times with no NA values , yet NA values show up upon check

> sum(is.na(check$start))
[1] 19 
> sum(is.na(check$end))
[1] 23

The data is present in the cells, so why does this happen? I have heard that this can happen with PosixLT but even when I convert this to posixCT, there is very strange behavior. How does one go about solving this?

> as.POSIXct(check$start, format = "%Y-%m-%d %H:%M:%S", tz = "CST6CDT")
 [1] NA                        "2014-03-09 01:35:01 CST" NA                        "2014-03-09 01:53:30 CST" NA                       
 [6] NA                        NA                        NA                        NA                        "2014-03-09 04:17:11 CDT"
[11] NA                        NA                        "2015-03-08 01:54:43 CST" NA                        NA                       
[16] NA                        NA                        NA                        NA                        NA                       
[21] NA                        NA                        NA  


> dput(check)
structure(list(start = structure(list(sec = c(24, 1, 27, 30, 
8, 21, 40, 9, 43, 11, 31, 43, 43, 55, 39, 54, 41, 19, 2, 35, 
6, 54, 40), min = c(45L, 35L, 14L, 53L, 36L, 37L, 47L, 48L, 54L, 
17L, 57L, 53L, 54L, 3L, 52L, 22L, 34L, 28L, 41L, 42L, 52L, 52L, 
53L), hour = c(2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), mday = c(9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L), mon = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    year = c(114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 
    114L, 114L, 114L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 
    115L, 115L, 115L, 115L, 115L), wday = c(0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), yday = c(67L, 67L, 67L, 67L, 67L, 67L, 67L, 
    67L, 67L, 67L, 67L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 
    66L, 66L, 66L, 66L), isdst = c(-1L, 0L, -1L, 0L, -1L, -1L, 
    -1L, -1L, -1L, 1L, -1L, -1L, 0L, -1L, -1L, -1L, -1L, -1L, 
    -1L, -1L, -1L, -1L, -1L), zone = c("", "CST", "", "CST", 
    "", "", "", "", "", "CDT", "", "", "CST", "", "", "", "", 
    "", "", "", "", "", ""), gmtoff = c(NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon", 
"year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", 
"POSIXt"), tzone = c("CST6CDT", "CST", "CDT")), end = structure(list(
    sec = c(7, 59, 38, 45, 29, 46, 39, 14, 52, 29, 37, 5, 23, 
    41, 10, 43, 46, 46, 53, 24, 57, 13, 51), min = c(55L, 47L, 
    30L, 2L, 43L, 51L, 53L, 56L, 54L, 54L, 57L, 56L, 6L, 3L, 
    13L, 29L, 37L, 32L, 48L, 47L, 55L, 55L, 55L), hour = c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L), mday = c(9L, 9L, 9L, 9L, 9L, 
    9L, 9L, 9L, 9L, 9L, 9L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 
    8L, 8L, 8L), mon = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    year = c(114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 
    114L, 114L, 114L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 
    115L, 115L, 115L, 115L, 115L), wday = c(0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L), yday = c(67L, 67L, 67L, 67L, 67L, 67L, 67L, 
    67L, 67L, 67L, 67L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 
    66L, 66L, 66L, 66L), isdst = c(-1L, -1L, -1L, -1L, -1L, -1L, 
    -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, 
    -1L, -1L, -1L, -1L, -1L), zone = c("", "", "", "", "", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    "", ""), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
    )), .Names = c("sec", "min", "hour", "mday", "mon", "year", 
"wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", 
"POSIXt"), tzone = c("CST6CDT", "CST", "CDT"))), .Names = c("start", 
"end"), row.names = c(1559963L, 1560092L, 1560157L, 1560220L, 
1560240L, 1560247L, 1560252L, 1560253L, 1560255L, 1560258L, 1560260L, 
2004432L, 2004583L, 2004591L, 2004594L, 2004596L, 2004598L, 2004599L, 
2004600L, 2004603L, 2004609L, 2004610L, 2004611L), class = "data.frame")

解决方案

How works is.na in this context ?

> is.na.POSIXlt
function (x) 
is.na(as.POSIXct(x))
<bytecode: 0x0000000014232980>

How does as.POSIXct behave here ?

> as.POSIXct(check$start)
 [1] NA                        "2014-03-09 01:35:01 CST" NA                        "2014-03-09 01:53:30 CST"
 [5] NA                        NA                        NA                        NA                       
 [9] NA                        "2014-03-09 04:17:11 CDT" NA                        NA                       
[13] "2015-03-08 01:54:43 CST" NA                        NA                        NA                       
[17] NA                        NA                        NA                        NA                       
[21] NA                        NA                        NA                       

Ok, but WHY ????

Let's check the doc of as.POSIXct:

Any conversion that needs to go between the two date-time classes requires a time zone: conversion from "POSIXlt" to "POSIXct" will validate times in the selected time zone. One issue is what happens at transitions to and from DST, for example in the UK

Let's see:

> check$start$zone
 [1] ""    "CST" ""    "CST" ""    ""    ""    ""    ""    "CDT" ""    ""    "CST" ""    ""    ""    ""    ""    ""    ""   
[21] ""    ""    ""   

An here's the dragons, there's no timezone except for 4 entries, so as.POSIXct can't tell if the dates are valid (within DST change or not ?) as you can see with:

> check$start$isdst
 [1] -1  0 -1  0 -1 -1 -1 -1 -1  1 -1 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

So the converstion between POSIXlt (your dataframe) and POSIXct can't guess if the date is valid, and return NA.

One fixing method could be to enforce a timezone on all records:

> check$start <- as.POSIXlt(strftime(check$start,tz="CST"),tz="CST6CDT")
> is.na(check$start)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

这篇关于R表示NA,尽管值存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆