R:strptime()和is.na()意外结果 [英] R: strptime() and is.na () unexpected results
问题描述
strptime()
: df $ date .time< - strptime(df $ date.time,%m /%d /%y%I:%M:%S%p)
这对1104行中的所有行都是正常的,我用
df [is.na(df $ date.time),]
当我看这些问题数据, date.time
条目似乎按照我期望的方式进行格式化。例如,这是一个观察结果,作为一个问题出现,但似乎不是NA:
id date 。时间结果
观察543490 2012-03-11 02:14:01 C
什么可以可能在这里, is.na(df $ date.time)
返回一个TRUE值,这个行显然已经被正确转换了?
这是一个可重现的例子(如果你在CST):
is.na(strptime (03/11/12 2:14:01 AM,%m /%d /%y%I:%M:%S%p,CST6CDT))
#[1] TRUE
问题可能是所有时间返回 NA
不存在您使用的任何时区,由于夏令时。
检查数据源要确定数据记录的时区,然后在调用 strptime $ c $中的
tz
参数中设置该值c>。
I have a data frame with roughly 8 million rows and 3 columns. I used strptime()
in the following manner:
df$date.time <- strptime(df$date.time, "%m/%d/%y %I:%M:%S %p")
This works fine for all but 1104 of the rows, which I checked using
df[is.na(df$date.time), ]
When I look at these "problem" data, the date.time
entries seem to be formatted in the way I would expect. For example, here is an observation that comes up as a problem, but doesn't appear to be an NA:
id date.time outcome
observation543490 2012-03-11 02:14:01 C
What could possibly be going on here that is.na(df$date.time)
returns a TRUE value for this row that has apparently been converted correctly?
Here's a reproducible example (if you're in CST):
is.na(strptime("03/11/12 2:14:01 AM", "%m/%d/%y %I:%M:%S %p", "CST6CDT"))
#[1] TRUE
The problem is likely that all the times that return NA
do not exist in whatever timezone you're using, due to daylight saving time.
Check with the data source to determine the timezone the data were recorded in, then set the tz
argument to that value in your call to strptime
.
这篇关于R:strptime()和is.na()意外结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!