R表示NA,尽管值存在 [英] R shows NA although a value is present
问题描述
我有两列PosixLT次,没有NA值,但NA值显示在检查
> sum(is.na(check $ start))
[1] 19
> sum(is.na(check $ end))
[1] 23
数据存在于细胞中,为什么会发生这种情况?我听说这可能发生在PosixLT,但即使我将其转换为posixCT,也有非常奇怪的行为。如何解决这个问题?
> as.POSIXct(check $ start,format =%Y-%m-%d%H:%M:%S,tz =CST6CDT)
[1] NA2014-03-09 01 :35:01 CSTNA2014-03-09 01:53:30 CSTNA
[6] NA NA NA NA2014-03-09 04:17:11 CDT
[ 11] NA NA2015-03-08 01:54:43 CSTNA NA
[16] NA NA NA NA NA
[21] NA NA NA
> dput(check)
structure(list(start = structure(list(sec = c(24,1,27,30,
8,21,40,9,43,11,31, 43,55,39,54,41,19,2,35,
6,54,40),min = c(45L,35L,14L,53L,36L,37L,47L,48L,54L,
17L,57L,53L,54L,3L,52L,22L,34L,28L,41L,42L,52L,52L,
53L),小时= c(2L,1L,2L,1L,2L, 2L,2L,2L,2L,4L,2L,2L,
1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L),mday = c(9L,9L,
9L,9L,9L,9L,9L,9L,9L,9L,9L,8L,8L,8L,8L,8L,8L,8L,
8L,8L,8L,8L,8L) 2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L, 2L,2L),
year = c(114L,114L,114L,114L,114L,114L,114L,114L,
114L,114L,114L,115L,115L,115L,115L,115L,115L ,115L,115L,115L,115L,115L),wday = c(0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L, 0L,0L,0L,0L,0L,0L,0L,0L,
0L,0L,0L),yday = c(67L,67L,67L,67L,67L,67L,67L,
67L ,67L,67L,67L, 66L,66L,66L,66L,66L,66L,66L,66L,
66L,66L,66L,66L),isdst = c(-1L,0L,-1L,0L,-1L,-1L,
-1L,-1L,-1L,1L,-1L,-1L,0L,-1L,-1L,-1L,-1L,-1L,
-1L,-1L,-1L, -1L,-1L),zone = c(,CST,,CST,
,,,,,CDT ,,,),gmtoff = c(NA_integer_,NA_integer_),,CST,,, ,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_ ,NA_integer_,NA_integer_,NA_integer_,
NA_integer_)),.Names = c(sec,min,hour,mday,mon,
year,wday ,yday,isdst,zone,gmtoff),class = c(POSIXlt,
POSIXt),tzone = c(CST6CDT,CST )),end = structure(list(
sec = c(7,59,38,45,29,46,39,14,52,29,37,5,23,
41, 10,43 ,46,46,53,24,57,13,51),min = c(55L,47L,
30L,2L,43L,51L,53L,56L,54L,54L,57L,56L,6L, 3L,
13L,29L,37L,32L,48L,47L,55L,55L,55L),小时= c(2L,
2L,2L,2L,2L,2L,2L,2L,2L ,2L,2L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L,2L),mday = c(9L,9L,9L,9L,9L,
9L,9L,9L,9L,9L,9L,8L,8L,8L,8L,8L,8L,8L,8L,8L,
8L,8L,8L),mon = c(2L,2L ,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L, $ b year = c(114L,114L,114L,114L,114L,114L,114L,114L,
114L,114L,114L,115L,115L,115L,115L,115L,115L,115L,
115L,115L,115L,115L,115L),wday = c(0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L,0L,0L,0L, ,0L,0L,0L,0L),yday = c(67L,67L,67L,67L,67L,67L,67L,
67L,67L,67L,67L, 66L,66L,66L,66L,66L,66L,66L,66L,
66L,66L,66L,66L),isdst = c(-1L,-1L,-1L, ,-1L,-1L,
-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,-1L,
-1L,-1L,-1L,-1L,-1L),zone = c(,,,,,,
, ,,,,,,,,,,, = c(NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_,
NA_integer_,NA_integer_,NA_integer_,NA_integer_,NA_integer_ ,n=c(sec,min,小时,mday,mon,年
wday,yday,isdst,zone,gmtoff),class = c(POSIXlt,
POSIXt),tzone = c(CST6CDT CST,CDT))),.Names = c(start,
end),row.names = c(1559963L,1560092L,1560157L,1560220L,
1560240L,1560247L ,1560252L,1560253L,1560255L,1560258L,1560260L,
2004432L,2004583L,2004591L,2004594L,200459 6L,2004598L,2004599L,
2004600L,2004603L,2004609L,2004610L,2004611L),class =data.frame)
在这种情况下, is.na
的作用如何?
> is.na.POSIXlt
function(x)
is.na(as.POSIXct(x))
< bytecode:0x0000000014232980>
as.POSIXct
在这里表现如何?
> as.POSIXct(check $ start)
[1] NA2014-03-09 01:35:01 CSTNA2014-03-09 01:53:30 CST
[5] NA NA NA NA
[9] NA2014-03-09 04:17:11 CDTNA NA
[13]2015-03-08 01:54:43 CSTNA NA NA
[17] NA NA NA NA
[21] NA NA NA
好的,但是为什么?
我们来检查 as.POSIXct的文档
:
需要在两个日期时间类之间进行的任何转换
需要一个时区:从POSIXlt到POSIXct的转换将
验证次数在选定的时区。一个问题是在
转换到和从DST之间发生的情况,例如在英国
我们来看看:
>检查$ start $ zone
[1]CSTCSTCST
[21]
龙,没有时区,除了4个条目,所以 as.POSIXct
无法确定日期是否有效(在DST更改或不?),您可以看到:
> check $ start $ isdst
[1] -1 0 -1 0 -1 -1 -1 -1 -1 1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
所以POSIXlt(您的数据帧)与POSIXct之间的转换无法猜测日期有效,并返回NA。
一种修复方法可能是对所有记录执行时区:
>检查$ start< - as.POSIXlt(strftime(check $ start,tz =CST),tz =CST6CDT)
/ pre>
> is.na(check $ start)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
I have two columns of PosixLT times with no NA values , yet NA values show up upon check
> sum(is.na(check$start)) [1] 19 > sum(is.na(check$end)) [1] 23
The data is present in the cells, so why does this happen? I have heard that this can happen with PosixLT but even when I convert this to posixCT, there is very strange behavior. How does one go about solving this?
> as.POSIXct(check$start, format = "%Y-%m-%d %H:%M:%S", tz = "CST6CDT") [1] NA "2014-03-09 01:35:01 CST" NA "2014-03-09 01:53:30 CST" NA [6] NA NA NA NA "2014-03-09 04:17:11 CDT" [11] NA NA "2015-03-08 01:54:43 CST" NA NA [16] NA NA NA NA NA [21] NA NA NA > dput(check) structure(list(start = structure(list(sec = c(24, 1, 27, 30, 8, 21, 40, 9, 43, 11, 31, 43, 43, 55, 39, 54, 41, 19, 2, 35, 6, 54, 40), min = c(45L, 35L, 14L, 53L, 36L, 37L, 47L, 48L, 54L, 17L, 57L, 53L, 54L, 3L, 52L, 22L, 34L, 28L, 41L, 42L, 52L, 52L, 53L), hour = c(2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 4L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), mday = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), mon = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), year = c(114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L), wday = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), yday = c(67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L), isdst = c(-1L, 0L, -1L, 0L, -1L, -1L, -1L, -1L, -1L, 1L, -1L, -1L, 0L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L), zone = c("", "CST", "", "CST", "", "", "", "", "", "CDT", "", "", "CST", "", "", "", "", "", "", "", "", "", ""), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_)), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", "POSIXt"), tzone = c("CST6CDT", "CST", "CDT")), end = structure(list( sec = c(7, 59, 38, 45, 29, 46, 39, 14, 52, 29, 37, 5, 23, 41, 10, 43, 46, 46, 53, 24, 57, 13, 51), min = c(55L, 47L, 30L, 2L, 43L, 51L, 53L, 56L, 54L, 54L, 57L, 56L, 6L, 3L, 13L, 29L, 37L, 32L, 48L, 47L, 55L, 55L, 55L), hour = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), mday = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L), mon = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), year = c(114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 114L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L, 115L), wday = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), yday = c(67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 67L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L, 66L), isdst = c(-1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L, -1L), zone = c("", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""), gmtoff = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_ )), .Names = c("sec", "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst", "zone", "gmtoff"), class = c("POSIXlt", "POSIXt"), tzone = c("CST6CDT", "CST", "CDT"))), .Names = c("start", "end"), row.names = c(1559963L, 1560092L, 1560157L, 1560220L, 1560240L, 1560247L, 1560252L, 1560253L, 1560255L, 1560258L, 1560260L, 2004432L, 2004583L, 2004591L, 2004594L, 2004596L, 2004598L, 2004599L, 2004600L, 2004603L, 2004609L, 2004610L, 2004611L), class = "data.frame")
解决方案How works
is.na
in this context ?> is.na.POSIXlt function (x) is.na(as.POSIXct(x)) <bytecode: 0x0000000014232980>
How does
as.POSIXct
behave here ?> as.POSIXct(check$start) [1] NA "2014-03-09 01:35:01 CST" NA "2014-03-09 01:53:30 CST" [5] NA NA NA NA [9] NA "2014-03-09 04:17:11 CDT" NA NA [13] "2015-03-08 01:54:43 CST" NA NA NA [17] NA NA NA NA [21] NA NA NA
Ok, but WHY ????
Let's check the doc of
as.POSIXct
:Any conversion that needs to go between the two date-time classes requires a time zone: conversion from "POSIXlt" to "POSIXct" will validate times in the selected time zone. One issue is what happens at transitions to and from DST, for example in the UK
Let's see:
> check$start$zone [1] "" "CST" "" "CST" "" "" "" "" "" "CDT" "" "" "CST" "" "" "" "" "" "" "" [21] "" "" ""
An here's the dragons, there's no timezone except for 4 entries, so
as.POSIXct
can't tell if the dates are valid (within DST change or not ?) as you can see with:> check$start$isdst [1] -1 0 -1 0 -1 -1 -1 -1 -1 1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
So the converstion between POSIXlt (your dataframe) and POSIXct can't guess if the date is valid, and return NA.
One fixing method could be to enforce a timezone on all records:
> check$start <- as.POSIXlt(strftime(check$start,tz="CST"),tz="CST6CDT") > is.na(check$start) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
这篇关于R表示NA,尽管值存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!