为什么dplyr的mutate()会更改时间格式? [英] Why does dplyr's mutate() change the time format?

查看:105
本文介绍了为什么dplyr的mutate()会更改时间格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用readr读取包含时间格式的日期列的数据.我可以使用readrcol_types选项正确读取它.

I use readr to read in data which consists a date column in time format. I can read it in correctly using the col_types option of readr.

library(dplyr)
library(readr)

sample <- "time,id
2015-03-05 02:28:11,1674
2015-03-03 13:10:59,36749
2015-03-05 07:55:48,NA
2015-03-05 06:13:19,NA
"

mydf <- read_csv(sample, col_types="Ti")
mydf
                 time    id
1 2015-03-05 02:28:11  1674
2 2015-03-03 13:10:59 36749
3 2015-03-05 07:55:48    NA
4 2015-03-05 06:13:19    NA

这很好.但是,如果我想用dplyr操作此列,则时间列会丢失其格式.

This is nice. However, if I want to manipulate this column with dplyr, the time column loses its format.

mydf %>% mutate(time = ifelse(is.na(id), NA, time))
        time    id
1 1425522491  1674
2 1425388259 36749
3         NA    NA
4         NA    NA

为什么会这样?

我知道我可以通过将其转换为字符来解决此问题,但是如果不来回转换,它将更加方便.

I know I can work around this problem by transforming it to character before, but it would be more convenient without transforming back and forth.

mydf %>% mutate(time = as.character(time)) %>% 
    mutate(time = ifelse(is.na(id), NA, time))

推荐答案

实际上是ifelse()导致了此问题,而不是dplyr::mutate(). help(ifelse)-

It's actually ifelse() that is causing this issue, not dplyr::mutate(). An example of the problem of attribute stripping is shown in help(ifelse) -

## ifelse() strips attributes
## This is important when working with Dates and factors
x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month")
## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years
y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA)
head(y) # not what you expected ... ==> need restore the class attribute:
class(y) <- class(x)

因此,您已经拥有了它.如果要使用ifelse(),则需要做一些额外的工作.这是两种可能的方法,无需ifelse()即可达到所需的结果.第一个非常简单,使用is.na<-.

So there you have it. It's a bit of extra work if you want to use ifelse(). Here are two possible methods that will get you to your desired result without ifelse(). The first is really simple and uses is.na<-.

## mark 'time' as NA if 'id' is NA
is.na(mydf$time) <- is.na(mydf$id)

## resulting in
mydf
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

如果您不想选择该路线,并希望继续使用dplyr方法,则可以使用replace()而不是ifelse().

If you don't wish to choose that route, and want to continue with the dplyr method, you can use replace() instead of ifelse().

mydf %>% mutate(time = replace(time, is.na(id), NA))
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

数据:

mydf <- structure(list(time = structure(c(1425551291, 1425417059, 1425570948, 
1425564799), class = c("POSIXct", "POSIXt"), tzone = ""), id = c(1674L, 
36749L, NA, NA)), .Names = c("time", "id"), class = "data.frame", row.names = c(NA, 
-4L))

这篇关于为什么dplyr的mutate()会更改时间格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆