data.frame内的修改日期变为< NA>选择后 [英] Modified date inside data.frame becomes <NA> after selection

查看:224
本文介绍了data.frame内的修改日期变为< NA>选择后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.frame d 包含一些POSIX日期,我想通过 d $ date $ year< 100 。这似乎首先工作正常,但是在此数据框架中选择一些行后,除了第一个修改日期之外,所有行都将转换为 。我在这里做错了什么?请参阅下面的代码。 ( R-Fiddle

 日期<  -  c(2014-01-01,2015-01-02,年份03)
val < - c(a,b,c)
d< - data.frame(date,val)
d $ date< - strptime(d $ date, format =%Y-%m-%d)
d
#日期val
#1 2014-01-01 a
#2 2015-01-02 b
$ 3


d [c(TRUE,TRUE,TRUE))]
#date val
# 1 2014-01-01 a
#2 2015-01-02 b
#3 2016-01-3 c
#正确日期为预期

d $ date2000 < - d $ date
d $ date2000 $ year< - 100#设置年份至2000

d
#日期val date2000
#1 2014-01-01 a 2000-01-01
#2 2015-01-02 b 2000-01-02
#3 2016-01-03 c 2000-01-03
#正确日期为预期

d [c(TRUE,TRUE,TRUE))]
#日期val date2000
#1 2014-01-01 a 2000-01-01
#2 2015 -01-02 b< NA>
#3 ______ ______ c< NA>
#第一个条目正确,第二个和第三个条目< NA>


解决方案

此问题何时发生?



似乎在调用函数 [。data.frame (见 d [c(TRUE ,TRUE,TRUE),] 而且 d [1:3,] 甚至 d [3,] )。以下是该函数的定义:

 > `[.data.frame` 
function(x,i,j,drop = if(missing(i))TRUE else length(cols)==
1)
{
mdrop< - missing(drop)
Narg< - nargs() - (!mdrop)
has.j< - !missing(j)
if(!all (sys.call())%in%c(,drop))&!isS4(x))
警告(除了drop之外的命名参数不鼓励)
if(Narg <3L){
if(!mdrop)
warning('drop'argument will be ignored)
if(missing(i))
return(x)
if(is.matrix(i))
return(as.matrix(x)[i])
nm< - 名称(x)
if (is.null(nm))
nm < - character()
if(!is.character(i)&& anyNA(nm)){
names(nm) < - name(x)< - seq_along(x)
y< - NextMethod([]
cols< - 名称(y)
if(anyNA(cols)
stop(undefined columns se选择)
cols < - names(y)< - nm [cols]
}
else {
y < - NextMethod([]
cols< - name(y)
if(!is.null(cols)&& $($)
停止(未定义列选择)
}
if(anyDuplicated(cols))
名称(y)< - make.unique(cols)
attr(y,row.names)< - .row_names_info(x,0L)
attr(y,class)< - oldClass(x)
return )
}
if(missing(i)){
if(drop&&!has.j&& length(x)== 1L)
return (.subset2(x,1L))
nm < - names(x)
if(is.null(nm))
nm< - character()
if (has.j&!is.character(j)&& anyNA(nm)){
names(nm)< - 名称(x)< - seq_along(x)$ b $ $($)
如果(anyNA(cols))
停止(未定义列选择)
cols< - name(y)< - nm [cols]
}
else {
y< - if(has.j)
.subset(x,j)
else x
cols < - names(y)
if(anyNA(cols))
stop(undefined columns selected)
}
if(drop&&&长度(y)== 1L)
return(.subset2(y,1L))
if(anyDuplicated(cols))
名称(y)< - make.unique(cols)
nrow< - .row_names_info(x,2L)
if(drop&&!mdrop&& nrow == 1L)
return(structure(y,class = ,row.names = NULL))
else {
attr(y,class)< - oldClass(x)
attr(y,row.names)< .row_names_info(x,0L)
return(y)
}
}
xx< - x
cols< - 名称(xx)
x < - vector(list,length(x))
x < - .Internal(copyDFattr(xx,x))
oldClass(x)< - attr(x,row.names )< - NULL
if(has.j){
nm < - names(x)
if(is.null(nm))
nm < character()
if(!is.character(j)&& anyNA(nm))
名称(nm)< - 名称(x)< - seq_along(x)
x< - x [j]
co ls< - name(x)
if(drop&&& length(x)== 1L){
if(is.character(i)){
rows< - attr(xx,row.names)
i< - pmatch i,rows,duplicateates.ok = TRUE)
}
xj < - .subset2(.subset(xx,j),1L)
return(if(length(dim(xj) )!= 2L)xj [i] else xj [i,
,drop = FALSE])
}
if(anyNA(cols))
stop(undefined columns selected )
if(!is.null(names(nm)))
cols < - names(x)< - nm [cols]
nxx< - structure(seq_along xx),names = names(xx))
sxx< - match(nxx [j],seq_along(xx))
}
else sxx< - seq_along(x)
rows< - NULL
if(is.character(i)){
rows < - attr(xx,row.names)
i< - pmatch(i, ($)
}
(seq_along(x)中的j){
xj < - xx [[sxx [j]]]
x [ j]]< - if(length(dim (xj))!= 2L)
xj [i]
else xj [i,,drop = FALSE]
}
if(drop){
n < - length(x)
if(n == 1L)
return(x [[1L]])
if(n> 1L){
xj < - x [[1L]]
nrow< - if(length(dim(xj))== 2L)
dim(xj)[1L]
else length(xj)
drop< - !mdrop&&& nrow == 1L
}
else drop< - FALSE
}
if(!drop){
if(is.null(rows))
行< - attr(xx,row.names)
rows < - rows [i]
if((ina< - anyNA(rows))|(dup< - anyDuplicated (行))){
if(!dup&& is.character(rows))
dup < - NA%行%b $ b if(ina)
rows [is.na(rows)]< - NA
if(dup)
rows< - make.unique(as.character(rows))
}
if(has.j&& anyDuplicated(nm< - names(x)))
names(x)< - make.unique(nm)
if(is.null (行))
rows < - attr(xx,row.names)[i]
attr(x,row.names)< - rows
oldClass )< - oldClass(xx)
}
x
}
< bytecode:0x7fe8cc3a5548>
< environment:namespace:base>

相关位发生在这里:

 for(j in seq_along(x)){
xj < - xx [[sxx [j]]]
x [[j]]< - if长度(dim(xj))!= 2L)
xj [i]
else xj [i,,drop = FALSE]
}
/ pre>

在这一点(例如在 d [3,] 示例中),我们有:

 > str(xx)
'data.frame':3 obs。的3个变量:
$ date:POSIXlt,格式:2014-01-012015-01-022016-01-03
$ val:因素w / 3级a ,b,c:1 2 3
$ date2000:POSIXlt,格式:2000-01-012000-01-022000-01-03
> ; str(x)
列表3
$ date:NULL
$ val:NULL
$ date2000:NULL
> i
[1] 3
> str(sxx)
int [1:3] 1 2 3

对于j = 3我们有:

 > str(xj)
POSIXlt [1:3],格式为2000-01-012000-01-022000-01-03
> dim(xj)
NULL
> xj [3]
[1] NA

所以这是它失败的地方。
我认为问题来自(你注意到),因为你把 d $ date2000 $ year 换成1而不是3:

 > xj $ wday 
[1] 3 5 0
> xj $ year
[1] 100
> xj [3]
[1] NA
> xj $ year< - c(100,100,100)
> xj [3]
[1]2000-01-03 CET

当显示 xj (或 d )时, xj $ year 被回收,但是当仅显示 xj [3] 时,它尝试构建POSIXlt并失败,因为它缺少元素。而且如果我们尝试使用两个元素,而不是一个或三个元素,我们可以看到被回收的向量:

 > xj $ year<  -  c(100,101)
> xj
[1]2000-01-01 CET2001-01-02 CET2000-01-03 CET
> xj [2]
[1]2001-01-02 CET
> xj [3]
[1] NA


I have a data.frame d containing some POSIX dates for which I want to modify the year by d$date$year <- 100. This seems to work fine at first, however after selecting some rows in this data.frame all but the first modified dates are converted to <NA>. What am I doing wrong here? See the code below. (R-Fiddle)

date <- c("2014-01-01","2015-01-02","2016-01-03")
val <- c("a","b","c")
d <- data.frame(date,val)
d$date <- strptime(d$date,format="%Y-%m-%d")
d 
#        date val
#1 2014-01-01   a
#2 2015-01-02   b
#3 2016-01-03   c
# correct date as expected

d[c(TRUE,TRUE,TRUE),] 
#        date val
#1 2014-01-01   a
#2 2015-01-02   b
#3 2016-01-03   c
# correct dates as expected

d$date2000 <- d$date
d$date2000$year <- 100 # set year to 2000

d 
#        date val   date2000
#1 2014-01-01   a 2000-01-01
#2 2015-01-02   b 2000-01-02
#3 2016-01-03   c 2000-01-03
# correct dates as expected

d[c(TRUE,TRUE,TRUE),] 
#        date val   date2000
#1 2014-01-01   a 2000-01-01
#2 2015-01-02   b       <NA>
#3 2016-01-03   c       <NA>
# first entry correct, second and third entry <NA>

解决方案

When does this problem occurs?

It seems to occur during the call to function [.data.frame (see d[c(TRUE,TRUE,TRUE),] but also d[1:3,] or even d[3,]). Here is the definition of that function:

> `[.data.frame`
function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == 
    1) 
{
    mdrop <- missing(drop)
    Narg <- nargs() - (!mdrop)
    has.j <- !missing(j)
    if (!all(names(sys.call()) %in% c("", "drop")) && !isS4(x)) 
        warning("named arguments other than 'drop' are discouraged")
    if (Narg < 3L) {
        if (!mdrop) 
            warning("'drop' argument will be ignored")
        if (missing(i)) 
            return(x)
        if (is.matrix(i)) 
            return(as.matrix(x)[i])
        nm <- names(x)
        if (is.null(nm)) 
            nm <- character()
        if (!is.character(i) && anyNA(nm)) {
            names(nm) <- names(x) <- seq_along(x)
            y <- NextMethod("[")
            cols <- names(y)
            if (anyNA(cols)) 
                stop("undefined columns selected")
            cols <- names(y) <- nm[cols]
        }
        else {
            y <- NextMethod("[")
            cols <- names(y)
            if (!is.null(cols) && anyNA(cols)) 
                stop("undefined columns selected")
        }
        if (anyDuplicated(cols)) 
            names(y) <- make.unique(cols)
        attr(y, "row.names") <- .row_names_info(x, 0L)
        attr(y, "class") <- oldClass(x)
        return(y)
    }
    if (missing(i)) {
        if (drop && !has.j && length(x) == 1L) 
            return(.subset2(x, 1L))
        nm <- names(x)
        if (is.null(nm)) 
            nm <- character()
        if (has.j && !is.character(j) && anyNA(nm)) {
            names(nm) <- names(x) <- seq_along(x)
            y <- .subset(x, j)
            cols <- names(y)
            if (anyNA(cols)) 
                stop("undefined columns selected")
            cols <- names(y) <- nm[cols]
        }
        else {
            y <- if (has.j) 
                .subset(x, j)
            else x
            cols <- names(y)
            if (anyNA(cols)) 
                stop("undefined columns selected")
        }
        if (drop && length(y) == 1L) 
            return(.subset2(y, 1L))
        if (anyDuplicated(cols)) 
            names(y) <- make.unique(cols)
        nrow <- .row_names_info(x, 2L)
        if (drop && !mdrop && nrow == 1L) 
            return(structure(y, class = NULL, row.names = NULL))
        else {
            attr(y, "class") <- oldClass(x)
            attr(y, "row.names") <- .row_names_info(x, 0L)
            return(y)
        }
    }
    xx <- x
    cols <- names(xx)
    x <- vector("list", length(x))
    x <- .Internal(copyDFattr(xx, x))
    oldClass(x) <- attr(x, "row.names") <- NULL
    if (has.j) {
        nm <- names(x)
        if (is.null(nm)) 
            nm <- character()
        if (!is.character(j) && anyNA(nm)) 
            names(nm) <- names(x) <- seq_along(x)
        x <- x[j]
        cols <- names(x)
        if (drop && length(x) == 1L) {
            if (is.character(i)) {
                rows <- attr(xx, "row.names")
                i <- pmatch(i, rows, duplicates.ok = TRUE)
            }
            xj <- .subset2(.subset(xx, j), 1L)
            return(if (length(dim(xj)) != 2L) xj[i] else xj[i, 
                , drop = FALSE])
        }
        if (anyNA(cols)) 
            stop("undefined columns selected")
        if (!is.null(names(nm))) 
            cols <- names(x) <- nm[cols]
        nxx <- structure(seq_along(xx), names = names(xx))
        sxx <- match(nxx[j], seq_along(xx))
    }
    else sxx <- seq_along(x)
    rows <- NULL
    if (is.character(i)) {
        rows <- attr(xx, "row.names")
        i <- pmatch(i, rows, duplicates.ok = TRUE)
    }
    for (j in seq_along(x)) {
        xj <- xx[[sxx[j]]]
        x[[j]] <- if (length(dim(xj)) != 2L) 
            xj[i]
        else xj[i, , drop = FALSE]
    }
    if (drop) {
        n <- length(x)
        if (n == 1L) 
            return(x[[1L]])
        if (n > 1L) {
            xj <- x[[1L]]
            nrow <- if (length(dim(xj)) == 2L) 
                dim(xj)[1L]
            else length(xj)
            drop <- !mdrop && nrow == 1L
        }
        else drop <- FALSE
    }
    if (!drop) {
        if (is.null(rows)) 
            rows <- attr(xx, "row.names")
        rows <- rows[i]
        if ((ina <- anyNA(rows)) | (dup <- anyDuplicated(rows))) {
            if (!dup && is.character(rows)) 
                dup <- "NA" %in% rows
            if (ina) 
                rows[is.na(rows)] <- "NA"
            if (dup) 
                rows <- make.unique(as.character(rows))
        }
        if (has.j && anyDuplicated(nm <- names(x))) 
            names(x) <- make.unique(nm)
        if (is.null(rows)) 
            rows <- attr(xx, "row.names")[i]
        attr(x, "row.names") <- rows
        oldClass(x) <- oldClass(xx)
    }
    x
}
<bytecode: 0x7fe8cc3a5548>
<environment: namespace:base>

The relevant bit happens here:

for (j in seq_along(x)) {
            xj <- xx[[sxx[j]]]
            x[[j]] <- if (length(dim(xj)) != 2L) 
                xj[i]
            else xj[i, , drop = FALSE]
        }

At this point (in the d[3,] example for instance), we have this:

> str(xx)
'data.frame':   3 obs. of  3 variables:
 $ date    : POSIXlt, format: "2014-01-01" "2015-01-02" "2016-01-03"
 $ val     : Factor w/ 3 levels "a","b","c": 1 2 3
 $ date2000: POSIXlt, format: "2000-01-01" "2000-01-02" "2000-01-03"
> str(x)
List of 3
 $ date    : NULL
 $ val     : NULL
 $ date2000: NULL
> i
[1] 3
> str(sxx)
 int [1:3] 1 2 3

For j=3 we have:

> str(xj)
 POSIXlt[1:3], format: "2000-01-01" "2000-01-02" "2000-01-03"
> dim(xj)
NULL
> xj[3]
[1] NA

So this is where it fails. I think the problem comes (as you noted) from the fact that you replaced d$date2000$year by 1 value instead of 3:

> xj$wday
[1] 3 5 0
> xj$year
[1] 100
> xj[3]
[1] NA
> xj$year <- c(100,100,100)
> xj[3]
[1] "2000-01-03 CET"

It seems that when displaying xj (or d), the value for xj$year is recycled, but when displaying only xj[3] it tries to build the POSIXlt and fails as it lacks a year element. And indeed if we try with two elements, instead of one or three, we can see the vector being recycled:

> xj$year <- c(100,101)
> xj
[1] "2000-01-01 CET" "2001-01-02 CET" "2000-01-03 CET"
> xj[2]
[1] "2001-01-02 CET"
> xj[3]
[1] NA

这篇关于data.frame内的修改日期变为&lt; NA&gt;选择后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆