lubridate转换午夜时间戳返回NA:如何填充缺少的时间戳 [英] lubridate converting midnight timestamp returns NA: how to fill missing timestamp

查看:89
本文介绍了lubridate转换午夜时间戳返回NA:如何填充缺少的时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从CSV导入的R中的数据框.CSV中的时间"格式为%Y-%m-%d%H:%M:%S",如下所示:

I have a data frame in R that I have imported from a CSV. The "time" format in the csv is "%Y-%m-%d %H:%M:%S" like so:

> head(btc_data)
                 time  btc_price
1 2017-08-27 22:50:00 4,389.6113
2 2017-08-27 22:51:00 4,389.0850
3 2017-08-27 22:52:00 4,388.8625
4 2017-08-27 22:53:00 4,389.7888
5 2017-08-27 22:56:00 4,389.9138
6 2017-08-27 22:57:00 4,390.1663

当我运行 str(btc_data)时,返回的时间列是一个因素.因此,我已使用lubridate软件包将其转换为datetime,如下所示:

When I run str(btc_data) the time column comes back as a factor. So, I have converted this to datetime using the lubridate package as follows:

btc_data$time <- ymd_hms(as.character(btc_data$time)) 

问题是在午夜(5行)收集的数据无法解析并返回这样的NA值(在原始数据中,这些行缺少时间戳,因此 2017-08-29 00:00:00列为 2017-08-29 )-

The problem is the data collected at midnight (5 rows) fail to parse and return NA values like this (in the original data the timestamp is missing from these rows so 2017-08-29 00:00:00 is listed simply as 2017-08-29) -

724 2017-08-28 23:59:00  4,439.3313
725 NA                   4,439.6588
726 2017-08-29 00:01:00  4,440.3050

此外,第二个数据帧的组织方式不同:

Moreover, the second data frame is organized differently:

> str(eth_data)
'data.frame':   1081 obs. of  2 variables:
 $ time     : Factor w/ 1081 levels "8/28/17 16:19",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ eth_price: num  344 344 344 344 343 ...

当我尝试时:

> eth_data$time <- mdy_hms(as.character(eth_data$time))

我收到以下错误:

警告消息:所有格式解析失败.找不到格式.

Warning message: All formats failed to parse. No formats found.

编辑:我已经隔离了导致该问题的代码问题:

EDIT: I have isolated the code issue that is causing the problem:

> btc_data[721:726,]
                   time  btc_price
721 2017-08-28 23:57:00 4,439.8163
722 2017-08-28 23:58:00 4,440.2363
723 2017-08-28 23:58:00 4,440.2363
724 2017-08-28 23:59:00 4,439.3313
725 2017-08-29          4,439.6588
726 2017-08-29 00:01:00 4,440.3050

因此,每当时钟敲响午夜-便不会记录时间戳.CSV是通过数据流创建的,并且一直在增长,因此除非每天都能找到解决方法,否则这个问题每天都会继续发生.有什么建议吗?

So, each time the clock strikes midnight- the timestamp is not recorded. The CSV is being created via a data stream and is constantly growing, so this issue will continue to occur with each new day unless I can find a workaround. Any suggestions?

推荐答案

如果开始时原始数据中完全没有'00:00:00',则可以使用grep查找这些情况,然后粘贴'00:00:00',然后再使用ymd_hms()或mdy_hm()函数.

If the '00:00:00' is completely missing in the original data to begin with, you can use grep to find those cases, then paste '00:00:00' before using the ymd_hms() or mdy_hm() function.

第一种情况,日期/时间格式为"YYYY-mm-dd HH:MM:SS":

First case, where date/time format is 'YYYY-mm-dd HH:MM:SS':

#Before
test <- fread("time,  btc_price
2017-08-28 23:57:00, 4439.8163
2017-08-28 23:58:00, 4440.2363
2017-08-28 23:58:00, 4440.2363
2017-08-28 23:59:00, 4439.3313
2017-08-29         , 4439.6588
2017-08-29 00:01:00, 4440.3050")

test$time[grep("[0-9]{4}-[0-9]{2}-[0-9]{2}$",test$time)] <- paste(
  test$time[grep("[0-9]{4}-[0-9]{2}-[0-9]{2}$",test$time)],"00:00:00")

#After
print(test)

                  time btc_price
1: 2017-08-28 23:57:00  4439.816
2: 2017-08-28 23:58:00  4440.236
3: 2017-08-28 23:58:00  4440.236
4: 2017-08-28 23:59:00  4439.331
5: 2017-08-29 00:00:00  4439.659
6: 2017-08-29 00:01:00  4440.305

#Now you can use ymd_hms(as.character(df$date)) as usual.

第二种情况,其中日期/时间格式为"m/dd/yy HH:MM":

Second case, where date/time format is 'm/dd/yy HH:MM':

#Step 1 is to find/replace:
test <- fread("time,  btc_price
8/28/17 23:57, 4439.8163
8/28/17 23:57, 4440.2363
8/28/17 23:57, 4440.2363
8/28/17 23:57, 4439.3313
8/28/17      , 4439.6588
8/29/17 00:01, 4440.3050")

test$time[grep("[0-9]{1}/[0-9]{2}/[0-9]{2}$",test$time)] <- paste(
  test$time[grep("[0-9]{1}/[0-9]{2}/[0-9]{2}$",test$time)],"00:00"
)

print(test)
            time btc_price
1: 8/28/17 23:57  4439.816
2: 8/28/17 23:57  4440.236
3: 8/28/17 23:57  4440.236
4: 8/28/17 23:57  4439.331
5: 8/28/17 00:00  4439.659
6: 8/29/17 00:01  4440.305

#Step 2 is to adjust your mdy_hms() command; you need to leave off the 's':
#Ex. before:
mdy_hms(as.character("8/28/17 16:19"))
[1] NA
Warning message:
All formats failed to parse. No formats found. 

#After
test <- c("8/28/17 16:19","8/28/17 00:00")
mdy_hm(as.character(test))
[1] "2017-08-28 16:19:00 UTC" "2017-08-28 00:00:00 UTC"

通常,将数字格式化为R时不要用逗号表示也是一种好习惯;因此4,439.3313应该是4439.3313.否则,R可能会将其解释为列之间的逗号分隔.

In general, it's also good practice to have numbers be formatted without commas in R; so 4,439.3313 should be 4439.3313. Otherwise, R might interpret that as a comma separation between columns.

这篇关于lubridate转换午夜时间戳返回NA:如何填充缺少的时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆