为什么dplyr的mutate更改时间格式? [英] Why dplyr's mutate changes time format?

查看:134
本文介绍了为什么dplyr的mutate更改时间格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 readr 读取包含时间格式的日期列的数据。我可以正确使用 col_types 选项 readr 。

I use readr to read in data which consists a date column in time format. I can read it in correctly using the col_types option of readr.

library(dplyr)
library(readr)

sample <- "time,id
2015-03-05 02:28:11,1674
2015-03-03 13:10:59,36749
2015-03-05 07:55:48,NA
2015-03-05 06:13:19,NA
"

mydf <- read_csv(sample, col_types="Ti")
mydf
                 time    id
1 2015-03-05 02:28:11  1674
2 2015-03-03 13:10:59 36749
3 2015-03-05 07:55:48    NA
4 2015-03-05 06:13:19    NA

这很好但是,如果我想使用 dplyr 操纵该列,那么时间列会失去格式。

This is nice. However, if I want to manipulate this column with dplyr, the time column loses its format.

mydf %>% mutate(time = ifelse(is.na(id), NA, time))
        time    id
1 1425522491  1674
2 1425388259 36749
3         NA    NA
4         NA    NA

为什么这是发生吗?

我知道我可以解决这个问题,把它变成以前的角色,但是如果没有来回转换会更方便。

I know I can work around this problem by transforming it to character before, but it would be more convenient without transforming back and forth.

mydf %>% mutate(time = as.character(time)) %>% 
    mutate(time = ifelse(is.na(id), NA, time))


推荐答案

它实际上是导致该问题的 ifelse(),而不是 dplyr :: mutate() 。属性剥离问题的一个例子显示在 help(ifelse) -

It's actually ifelse() that is causing that issue, and not dplyr::mutate(). An example of the problem of attribute stripping is shown in help(ifelse) -


## ifelse() strips attributes
## This is important when working with Dates and factors
x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month")
## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years
y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA)
head(y) # not what you expected ... ==> need restore the class attribute:
class(y) <- class(x)


所以你有它。如果你想使用 ifelse(),这有点额外的工作。以下是两种可能的方法,无需 ifelse()即可获得所需的结果。第一个很简单,使用 is.na < -

So there you have it. It's a bit of extra work if you want to use ifelse(). Here are two possible methods that will get you to your desired result without ifelse(). The first is really simple and uses is.na<-.

## mark 'time' as NA if 'id' is NA
is.na(mydf$time) <- is.na(mydf$id)

## resulting in
mydf
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

如果您不想选择该路由,并希望继续 dplyr 方法,可以使用 replace()而不是 ifelse()

If you don't wish to choose that route, and want to continue with the dplyr method, you can use replace() instead of ifelse().

mydf %>% mutate(time = replace(time, is.na(id), NA))
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

数据:

mydf <- structure(list(time = structure(c(1425551291, 1425417059, 1425570948, 
1425564799), class = c("POSIXct", "POSIXt"), tzone = ""), id = c(1674L, 
36749L, NA, NA)), .Names = c("time", "id"), class = "data.frame", row.names = c(NA, 
-4L))

这篇关于为什么dplyr的mutate更改时间格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆