data.frame内的修改日期变为< NA>选择后 [英] Modified date inside data.frame becomes <NA> after selection

查看：224 发布时间：2017/3/26 3:42:03 r date dataframe

本文介绍了data.frame内的修改日期变为< NA>选择后的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个data.frame d 包含一些POSIX日期，我想通过 d $ date $ year< 100 。这似乎首先工作正常，但是在此数据框架中选择一些行后，除了第一个修改日期之外，所有行都将转换为。我在这里做错了什么？请参阅下面的代码。（ R-Fiddle ）

 日期<  -  c（2014-01-01，2015-01-02，年份03）
 val < -  c（a，b，c）
d<  -  data.frame（date，val）
d $ date<  -  strptime（d $ date， format =％Y-％m-％d）
d 
＃日期val 
＃1 2014-01-01 a 
＃2 2015-01-02 b 
 $ 3 
 
 
d [c（TRUE，TRUE，TRUE））] 
＃date val 
＃ 1 2014-01-01 a 
＃2 2015-01-02 b 
＃3 2016-01-3 c 
＃正确日期为预期
 
d $ date2000 <  -  d $ date 
d $ date2000 $ year<  -  100＃设置年份至2000 
 
d 
＃日期val date2000 
＃1 2014-01-01 a 2000-01-01 
＃2 2015-01-02 b 2000-01-02 
＃3 2016-01-03 c 2000-01-03 
＃正确日期为预期
 
d [c（TRUE，TRUE，TRUE））] 
＃日期val date2000 
＃1 2014-01-01 a 2000-01-01 
＃2 2015 -01-02 b< NA> 
＃3 ______ ______ c< NA> 
＃第一个条目正确，第二个和第三个条目< NA>

解决方案

此问题何时发生？

似乎在调用函数 [。data.frame （见 d [c（TRUE ，TRUE，TRUE），] 而且 d [1：3，] 甚至 d [3，] ）。以下是该函数的定义：

 > `[.data.frame` 
 function（x，i，j，drop = if（missing（i））TRUE else length（cols）== 
 1）
 {
 mdrop<  -  missing（drop）
 Narg<  -  nargs（） - （！mdrop）
 has.j<  - ！missing（j）
 if（！all （sys.call（））％in％c（，drop））&！isS4（x））
警告（除了drop之外的命名参数不鼓励）
 if（Narg <3L）{
 if（！mdrop）
 warning（'drop'argument will be ignored）
 if（missing（i））
 return（x）
 if（is.matrix（i））
 return（as.matrix（x）[i]）
 nm<  - 名称（x）
 if （is.null（nm））
 nm < -  character（）
 if（！is.character（i）&& anyNA（nm））{
 names（nm） <  -  name（x）<  -  seq_along（x）
y<  -  NextMethod（[] 
 cols<  - 名称（y）
 if（anyNA（cols） 
 stop（undefined columns se选择）
 cols < -  names（y）<  -  nm [cols] 
} 
 else {
y < -  NextMethod（[] 
 cols<  -  name（y）
 if（！is.null（cols）&& $（$）
停止（未定义列选择）
} 
 if（anyDuplicated（cols））
名称（y）<  -  make.unique（cols） 
 attr（y，row.names）<  -  .row_names_info（x，0L）
 attr（y，class）<  -  oldClass（x）
 return ）
} 
 if（missing（i））{
 if（drop&&！has.j&& length（x）== 1L）
 return （.subset2（x，1L））
 nm < -  names（x）
 if（is.null（nm））
 nm<  -  character（）
 if （has.j&！is.character（j）&& anyNA（nm））{
 names（nm）<  - 名称（x）<  -  seq_along（x）$ b $ $（$）
如果（anyNA（cols））
停止（未定义列选择）
 cols<  -  name（y）<  -  nm [cols] 
} 
 else {
y<  -  if（has.j）
 .subset（x，j）
 else x 
 cols < -  names（y）
 if（anyNA（cols））
 stop（undefined columns selected）
} 
 if（drop&&&长度（y）== 1L）
 return（.subset2（y，1L））
 if（anyDuplicated（cols））
名称（y）<  -  make.unique（cols） 
 nrow<  -  .row_names_info（x，2L）
 if（drop&&！mdrop&& nrow == 1L）
 return（structure（y，class = ，row.names = NULL））
 else {
 attr（y，class）<  -  oldClass（x）
 attr（y，row.names）< .row_names_info（x，0L）
 return（y）
} 
} 
 xx<  -  x 
 cols<  - 名称（xx）
x <  -  vector（list，length（x））
x < -  .Internal（copyDFattr（xx，x））
 oldClass（x）<  -  attr（x，row.names ）< -  NULL 
 if（has.j）{
 nm < -  names（x）
 if（is.null（nm））
 nm < character（）
 if（！is.character（j）&& anyNA（nm））
名称（nm）<  - 名称（x）<  -  seq_along（x）
x<  -  x [j] 
 co ls<  -  name（x）
 if（drop&&& length（x）== 1L）{
 if（is.character（i））{
 rows<  -  attr（xx，row.names）
i<  -  pmatch i，rows，duplicateates.ok = TRUE）
} 
 xj < -  .subset2（.subset（xx，j），1L）
 return（if（length（dim（xj） ）！= 2L）xj [i] else xj [i，
，drop = FALSE]）
} 
 if（anyNA（cols））
 stop（undefined columns selected ）
 if（！is.null（names（nm）））
 cols < -  names（x）<  -  nm [cols] 
 nxx<  -  structure（seq_along xx），names = names（xx））
 sxx<  -  match（nxx [j]，seq_along（xx））
} 
 else sxx<  -  seq_along（x）
 rows<  -  NULL 
 if（is.character（i））{
 rows < -  attr（xx，row.names）
i<  -  pmatch（i， （$）
} 
（seq_along（x）中的j）{
 xj < -  xx [[sxx [j]]] 
x [ j]]<  -  if（length（dim （xj））！= 2L）
 xj [i] 
 else xj [i，，drop = FALSE] 
} 
 if（drop）{
n < -  length（x）
 if（n == 1L）
 return（x [[1L]]）
 if（n> 1L）{
 xj < -  x [[1L]] 
 nrow<  -  if（length（dim（xj））== 2L）
 dim（xj）[1L] 
 else length（xj）
 drop<  - ！mdrop&&& nrow == 1L 
} 
 else drop<  -  FALSE 
} 
 if（！drop）{
 if（is.null（rows））
行<  -  attr（xx，row.names）
 rows < -  rows [i] 
 if（（ina<  -  anyNA（rows））|（dup<  -  anyDuplicated （行）））{
 if（！dup&& is.character（rows））
 dup < - NA％行％b $ b if（ina）
 rows [is.na（rows）]<  - NA
 if（dup）
 rows<  -  make.unique（as.character（rows））
} 
 if（has.j&& anyDuplicated（nm<  -  names（x）））
 names（x）<  -  make.unique（nm）
 if（is.null （行））
 rows < -  attr（xx，row.names）[i] 
 attr（x，row.names）<  -  rows 
 oldClass ）<  -  oldClass（xx）
} 
x 
} 
< bytecode：0x7fe8cc3a5548> 
< environment：namespace：base>

相关位发生在这里：

 for（j in seq_along（x））{
 xj < -  xx [[sxx [j]]] 
x [[j]]<  -  if长度（dim（xj））！= 2L）
 xj [i] 
 else xj [i，，drop = FALSE] 
} 
  / pre> 
 
 在这一点（例如在 d [3，] 示例中），我们有： 
 > str（xx）
'data.frame'：3 obs。的3个变量：
 $ date：POSIXlt，格式：2014-01-012015-01-022016-01-03
 $ val：因素w / 3级a ，b，c：1 2 3 
 $ date2000：POSIXlt，格式：2000-01-012000-01-022000-01-03
> ; str（x）
列表3 
 $ date：NULL 
 $ val：NULL 
 $ date2000：NULL 
> i 
 [1] 3 
> str（sxx）
 int [1：3] 1 2 3 
  
对于j = 3我们有：
 > str（xj）
 POSIXlt [1：3]，格式为2000-01-012000-01-022000-01-03
> dim（xj）
 NULL 
> xj [3] 
 [1] NA 
  
所以这是它失败的地方。 
我认为问题来自（你注意到），因为你把 d $ date2000 $ year 换成1而不是3：
 > xj $ wday 
 [1] 3 5 0 
> xj $ year 
 [1] 100 
> xj [3] 
 [1] NA 
> xj $ year<  -  c（100,100,100）
> xj [3] 
 [1]2000-01-03 CET
  
当显示 xj （或 d ）时， xj $ year 被回收，但是当仅显示 xj [3] 时，它尝试构建POSIXlt并失败，因为它缺少年元素。而且如果我们尝试使用两个元素，而不是一个或三个元素，我们可以看到被回收的向量：
 > xj $ year<  -  c（100,101）
> xj 
 [1]2000-01-01 CET2001-01-02 CET2000-01-03 CET
> xj [2] 
 [1]2001-01-02 CET
> xj [3] 
 [1] NA 
  
 
I have a data.frame d containing some POSIX dates for which I want to modify the year by d$date$year <- 100. This seems to work fine at first, however after selecting some rows in this data.frame all but the first modified dates are converted to <NA>. What am I doing wrong here? See the code below. (R-Fiddle)
date <- c("2014-01-01","2015-01-02","2016-01-03")
val <- c("a","b","c")
d <- data.frame(date,val)
d$date <- strptime(d$date,format="%Y-%m-%d")
d 
#        date val
#1 2014-01-01   a
#2 2015-01-02   b
#3 2016-01-03   c
# correct date as expected

d[c(TRUE,TRUE,TRUE),] 
#        date val
#1 2014-01-01   a
#2 2015-01-02   b
#3 2016-01-03   c
# correct dates as expected

d$date2000 <- d$date
d$date2000$year <- 100 # set year to 2000

d 
#        date val   date2000
#1 2014-01-01   a 2000-01-01
#2 2015-01-02   b 2000-01-02
#3 2016-01-03   c 2000-01-03
# correct dates as expected

d[c(TRUE,TRUE,TRUE),] 
#        date val   date2000
#1 2014-01-01   a 2000-01-01
#2 2015-01-02   b       <NA>
#3 2016-01-03   c       <NA>
# first entry correct, second and third entry <NA>

 解决方案 
When does this problem occurs?

It seems to occur during the call to function [.data.frame (see d[c(TRUE,TRUE,TRUE),] but also d[1:3,] or even d[3,]). Here is the definition of that function:
> `[.data.frame`
function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == 
    1) 
{
    mdrop <- missing(drop)
    Narg <- nargs() - (!mdrop)
    has.j <- !missing(j)
    if (!all(names(sys.call()) %in% c("", "drop")) && !isS4(x)) 
        warning("named arguments other than 'drop' are discouraged")
    if (Narg < 3L) {
        if (!mdrop) 
            warning("'drop' argument will be ignored")
        if (missing(i)) 
            return(x)
        if (is.matrix(i)) 
            return(as.matrix(x)[i])
        nm <- names(x)
        if (is.null(nm)) 
            nm <- character()
        if (!is.character(i) && anyNA(nm)) {
            names(nm) <- names(x) <- seq_along(x)
            y <- NextMethod("[")
            cols <- names(y)
            if (anyNA(cols)) 
                stop("undefined columns selected")
            cols <- names(y) <- nm[cols]
        }
        else {
            y <- NextMethod("[")
            cols <- names(y)
            if (!is.null(cols) && anyNA(cols)) 
                stop("undefined columns selected")
        }
        if (anyDuplicated(cols)) 
            names(y) <- make.unique(cols)
        attr(y, "row.names") <- .row_names_info(x, 0L)
        attr(y, "class") <- oldClass(x)
        return(y)
    }
    if (missing(i)) {
        if (drop && !has.j && length(x) == 1L) 
            return(.subset2(x, 1L))
        nm <- names(x)
        if (is.null(nm)) 
            nm <- character()
        if (has.j && !is.character(j) && anyNA(nm)) {
            names(nm) <- names(x) <- seq_along(x)
            y <- .subset(x, j)
            cols <- names(y)
            if (anyNA(cols)) 
                stop("undefined columns selected")
            cols <- names(y) <- nm[cols]
        }
        else {
            y <- if (has.j) 
                .subset(x, j)
            else x
            cols <- names(y)
            if (anyNA(cols)) 
                stop("undefined columns selected")
        }
        if (drop && length(y) == 1L) 
            return(.subset2(y, 1L))
        if (anyDuplicated(cols)) 
            names(y) <- make.unique(cols)
        nrow <- .row_names_info(x, 2L)
        if (drop && !mdrop && nrow == 1L) 
            return(structure(y, class = NULL, row.names = NULL))
        else {
            attr(y, "class") <- oldClass(x)
            attr(y, "row.names") <- .row_names_info(x, 0L)
            return(y)
        }
    }
    xx <- x
    cols <- names(xx)
    x <- vector("list", length(x))
    x <- .Internal(copyDFattr(xx, x))
    oldClass(x) <- attr(x, "row.names") <- NULL
    if (has.j) {
        nm <- names(x)
        if (is.null(nm)) 
            nm <- character()
        if (!is.character(j) && anyNA(nm)) 
            names(nm) <- names(x) <- seq_along(x)
        x <- x[j]
        cols <- names(x)
        if (drop && length(x) == 1L) {
            if (is.character(i)) {
                rows <- attr(xx, "row.names")
                i <- pmatch(i, rows, duplicates.ok = TRUE)
            }
            xj <- .subset2(.subset(xx, j), 1L)
            return(if (length(dim(xj)) != 2L) xj[i] else xj[i, 
                , drop = FALSE])
        }
        if (anyNA(cols)) 
            stop("undefined columns selected")
        if (!is.null(names(nm))) 
            cols <- names(x) <- nm[cols]
        nxx <- structure(seq_along(xx), names = names(xx))
        sxx <- match(nxx[j], seq_along(xx))
    }
    else sxx <- seq_along(x)
    rows <- NULL
    if (is.character(i)) {
        rows <- attr(xx, "row.names")
        i <- pmatch(i, rows, duplicates.ok = TRUE)
    }
    for (j in seq_along(x)) {
        xj <- xx[[sxx[j]]]
        x[[j]] <- if (length(dim(xj)) != 2L) 
            xj[i]
        else xj[i, , drop = FALSE]
    }
    if (drop) {
        n <- length(x)
        if (n == 1L) 
            return(x[[1L]])
        if (n > 1L) {
            xj <- x[[1L]]
            nrow <- if (length(dim(xj)) == 2L) 
                dim(xj)[1L]
            else length(xj)
            drop <- !mdrop && nrow == 1L
        }
        else drop <- FALSE
    }
    if (!drop) {
        if (is.null(rows)) 
            rows <- attr(xx, "row.names")
        rows <- rows[i]
        if ((ina <- anyNA(rows)) | (dup <- anyDuplicated(rows))) {
            if (!dup && is.character(rows)) 
                dup <- "NA" %in% rows
            if (ina) 
                rows[is.na(rows)] <- "NA"
            if (dup) 
                rows <- make.unique(as.character(rows))
        }
        if (has.j && anyDuplicated(nm <- names(x))) 
            names(x) <- make.unique(nm)
        if (is.null(rows)) 
            rows <- attr(xx, "row.names")[i]
        attr(x, "row.names") <- rows
        oldClass(x) <- oldClass(xx)
    }
    x
}
<bytecode: 0x7fe8cc3a5548>
<environment: namespace:base>
The relevant bit happens here:
for (j in seq_along(x)) {
            xj <- xx[[sxx[j]]]
            x[[j]] <- if (length(dim(xj)) != 2L) 
                xj[i]
            else xj[i, , drop = FALSE]
        }
At this point (in the d[3,] example for instance), we have this:
> str(xx)
'data.frame':   3 obs. of  3 variables:
 $ date    : POSIXlt, format: "2014-01-01" "2015-01-02" "2016-01-03"
 $ val     : Factor w/ 3 levels "a","b","c": 1 2 3
 $ date2000: POSIXlt, format: "2000-01-01" "2000-01-02" "2000-01-03"
> str(x)
List of 3
 $ date    : NULL
 $ val     : NULL
 $ date2000: NULL
> i
[1] 3
> str(sxx)
 int [1:3] 1 2 3
For j=3 we have:
> str(xj)
 POSIXlt[1:3], format: "2000-01-01" "2000-01-02" "2000-01-03"
> dim(xj)
NULL
> xj[3]
[1] NA
So this is where it fails.
I think the problem comes (as you noted) from the fact that you replaced d$date2000$year by 1 value instead of 3:
> xj$wday
[1] 3 5 0
> xj$year
[1] 100
> xj[3]
[1] NA
> xj$year <- c(100,100,100)
> xj[3]
[1] "2000-01-03 CET"
It seems that when displaying xj (or d), the value for xj$year is recycled, but when displaying only xj[3] it tries to build the POSIXlt and fails as it lacks a year element. And indeed if we try with two elements, instead of one or three, we can see the vector being recycled:
> xj$year <- c(100,101)
> xj
[1] "2000-01-01 CET" "2001-01-02 CET" "2000-01-03 CET"
> xj[2]
[1] "2001-01-02 CET"
> xj[3]
[1] NA


                        
这篇关于data.frame内的修改日期变为&lt; NA&gt;选择后的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

data.frame内的修改日期变为< NA>选择后 [英] Modified date inside data.frame becomes <NA> after selection

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.frame内的修改日期变为&lt; NA&gt;选择后 [英] Modified date inside data.frame becomes &lt;NA&gt; after selection

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

data.frame内的修改日期变为< NA>选择后 [英] Modified date inside data.frame becomes <NA> after selection

登录关闭