data.frame内的修改日期变为< NA>选择后 [英] Modified date inside data.frame becomes <NA> after selection
问题描述
我有一个data.frame d
包含一些POSIX日期,我想通过 d $ date $ year< 100
。这似乎首先工作正常,但是在此数据框架中选择一些行后,除了第一个修改日期之外,所有行都将转换为
。我在这里做错了什么?请参阅下面的代码。 ( R-Fiddle )
日期< - c(2014-01-01,2015-01-02,年份03)
val < - c(a,b,c)
d< - data.frame(date,val)
d $ date< - strptime(d $ date, format =%Y-%m-%d)
d
#日期val
#1 2014-01-01 a
#2 2015-01-02 b
$ 3
d [c(TRUE,TRUE,TRUE))]
#date val
# 1 2014-01-01 a
#2 2015-01-02 b
#3 2016-01-3 c
#正确日期为预期
d $ date2000 < - d $ date
d $ date2000 $ year< - 100#设置年份至2000
d
#日期val date2000
#1 2014-01-01 a 2000-01-01
#2 2015-01-02 b 2000-01-02
#3 2016-01-03 c 2000-01-03
#正确日期为预期
d [c(TRUE,TRUE,TRUE))]
#日期val date2000
#1 2014-01-01 a 2000-01-01
#2 2015 -01-02 b< NA>
#3 ______ ______ c< NA>
#第一个条目正确,第二个和第三个条目< NA>
此问题何时发生?
似乎在调用函数 [。data.frame
(见 d [c(TRUE ,TRUE,TRUE),]
而且 d [1:3,]
甚至 d [3,]
)。以下是该函数的定义:
> `[.data.frame`
function(x,i,j,drop = if(missing(i))TRUE else length(cols)==
1)
{
mdrop< - missing(drop)
Narg< - nargs() - (!mdrop)
has.j< - !missing(j)
if(!all (sys.call())%in%c(,drop))&!isS4(x))
警告(除了drop之外的命名参数不鼓励)
if(Narg <3L){
if(!mdrop)
warning('drop'argument will be ignored)
if(missing(i))
return(x)
if(is.matrix(i))
return(as.matrix(x)[i])
nm< - 名称(x)
if (is.null(nm))
nm < - character()
if(!is.character(i)&& anyNA(nm)){
names(nm) < - name(x)< - seq_along(x)
y< - NextMethod([]
cols< - 名称(y)
if(anyNA(cols)
stop(undefined columns se选择)
cols < - names(y)< - nm [cols]
}
else {
y < - NextMethod([]
cols< - name(y)
if(!is.null(cols)&& $($)
停止(未定义列选择)
}
if(anyDuplicated(cols))
名称(y)< - make.unique(cols)
attr(y,row.names)< - .row_names_info(x,0L)
attr(y,class)< - oldClass(x)
return )
}
if(missing(i)){
if(drop&&!has.j&& length(x)== 1L)
return (.subset2(x,1L))
nm < - names(x)
if(is.null(nm))
nm< - character()
if (has.j&!is.character(j)&& anyNA(nm)){
names(nm)< - 名称(x)< - seq_along(x)$ b $ $($)
如果(anyNA(cols))
停止(未定义列选择)
cols< - name(y)< - nm [cols]
}
else {
y< - if(has.j)
.subset(x,j)
else x
cols < - names(y)
if(anyNA(cols))
stop(undefined columns selected)
}
if(drop&&&长度(y)== 1L)
return(.subset2(y,1L))
if(anyDuplicated(cols))
名称(y)< - make.unique(cols)
nrow< - .row_names_info(x,2L)
if(drop&&!mdrop&& nrow == 1L)
return(structure(y,class = ,row.names = NULL))
else {
attr(y,class)< - oldClass(x)
attr(y,row.names)< .row_names_info(x,0L)
return(y)
}
}
xx< - x
cols< - 名称(xx)
x < - vector(list,length(x))
x < - .Internal(copyDFattr(xx,x))
oldClass(x)< - attr(x,row.names )< - NULL
if(has.j){
nm < - names(x)
if(is.null(nm))
nm < character()
if(!is.character(j)&& anyNA(nm))
名称(nm)< - 名称(x)< - seq_along(x)
x< - x [j]
co ls< - name(x)
if(drop&&& length(x)== 1L){
if(is.character(i)){
rows< - attr(xx,row.names)
i< - pmatch i,rows,duplicateates.ok = TRUE)
}
xj < - .subset2(.subset(xx,j),1L)
return(if(length(dim(xj) )!= 2L)xj [i] else xj [i,
,drop = FALSE])
}
if(anyNA(cols))
stop(undefined columns selected )
if(!is.null(names(nm)))
cols < - names(x)< - nm [cols]
nxx< - structure(seq_along xx),names = names(xx))
sxx< - match(nxx [j],seq_along(xx))
}
else sxx< - seq_along(x)
rows< - NULL
if(is.character(i)){
rows < - attr(xx,row.names)
i< - pmatch(i, ($)
}
(seq_along(x)中的j){
xj < - xx [[sxx [j]]]
x [ j]]< - if(length(dim (xj))!= 2L)
xj [i]
else xj [i,,drop = FALSE]
}
if(drop){
n < - length(x)
if(n == 1L)
return(x [[1L]])
if(n> 1L){
xj < - x [[1L]]
nrow< - if(length(dim(xj))== 2L)
dim(xj)[1L]
else length(xj)
drop< - !mdrop&&& nrow == 1L
}
else drop< - FALSE
}
if(!drop){
if(is.null(rows))
行< - attr(xx,row.names)
rows < - rows [i]
if((ina< - anyNA(rows))|(dup< - anyDuplicated (行))){
if(!dup&& is.character(rows))
dup < - NA%行%b $ b if(ina)
rows [is.na(rows)]< - NA
if(dup)
rows< - make.unique(as.character(rows))
}
if(has.j&& anyDuplicated(nm< - names(x)))
names(x)< - make.unique(nm)
if(is.null (行))
rows < - attr(xx,row.names)[i]
attr(x,row.names)< - rows
oldClass )< - oldClass(xx)
}
x
}
< bytecode:0x7fe8cc3a5548>
< environment:namespace:base>
相关位发生在这里:
for(j in seq_along(x)){
xj < - xx [[sxx [j]]]
x [[j]]< - if长度(dim(xj))!= 2L)
xj [i]
else xj [i,,drop = FALSE]
}
/ pre>
在这一点(例如在
d [3,]
示例中),我们有:> str(xx)
'data.frame':3 obs。的3个变量:
$ date:POSIXlt,格式:2014-01-012015-01-022016-01-03
$ val:因素w / 3级a ,b,c:1 2 3
$ date2000:POSIXlt,格式:2000-01-012000-01-022000-01-03
> ; str(x)
列表3
$ date:NULL
$ val:NULL
$ date2000:NULL
> i
[1] 3
> str(sxx)
int [1:3] 1 2 3
对于j = 3我们有:
> str(xj)
POSIXlt [1:3],格式为2000-01-012000-01-022000-01-03
> dim(xj)
NULL
> xj [3]
[1] NA
所以这是它失败的地方。
我认为问题来自(你注意到),因为你把d $ date2000 $ year
换成1而不是3:> xj $ wday
[1] 3 5 0
> xj $ year
[1] 100
> xj [3]
[1] NA
> xj $ year< - c(100,100,100)
> xj [3]
[1]2000-01-03 CET
当显示
xj
(或d
)时,xj $ year
被回收,但是当仅显示xj [3]
时,它尝试构建POSIXlt并失败,因为它缺少年
元素。而且如果我们尝试使用两个元素,而不是一个或三个元素,我们可以看到被回收的向量:> xj $ year< - c(100,101)
> xj
[1]2000-01-01 CET2001-01-02 CET2000-01-03 CET
> xj [2]
[1]2001-01-02 CET
> xj [3]
[1] NA
I have a data.frame
d
containing some POSIX dates for which I want to modify the year byd$date$year <- 100
. This seems to work fine at first, however after selecting some rows in this data.frame all but the first modified dates are converted to<NA>
. What am I doing wrong here? See the code below. (R-Fiddle)date <- c("2014-01-01","2015-01-02","2016-01-03") val <- c("a","b","c") d <- data.frame(date,val) d$date <- strptime(d$date,format="%Y-%m-%d") d # date val #1 2014-01-01 a #2 2015-01-02 b #3 2016-01-03 c # correct date as expected d[c(TRUE,TRUE,TRUE),] # date val #1 2014-01-01 a #2 2015-01-02 b #3 2016-01-03 c # correct dates as expected d$date2000 <- d$date d$date2000$year <- 100 # set year to 2000 d # date val date2000 #1 2014-01-01 a 2000-01-01 #2 2015-01-02 b 2000-01-02 #3 2016-01-03 c 2000-01-03 # correct dates as expected d[c(TRUE,TRUE,TRUE),] # date val date2000 #1 2014-01-01 a 2000-01-01 #2 2015-01-02 b <NA> #3 2016-01-03 c <NA> # first entry correct, second and third entry <NA>
解决方案When does this problem occurs?
It seems to occur during the call to function
[.data.frame
(seed[c(TRUE,TRUE,TRUE),]
but alsod[1:3,]
or evend[3,]
). Here is the definition of that function:> `[.data.frame` function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == 1) { mdrop <- missing(drop) Narg <- nargs() - (!mdrop) has.j <- !missing(j) if (!all(names(sys.call()) %in% c("", "drop")) && !isS4(x)) warning("named arguments other than 'drop' are discouraged") if (Narg < 3L) { if (!mdrop) warning("'drop' argument will be ignored") if (missing(i)) return(x) if (is.matrix(i)) return(as.matrix(x)[i]) nm <- names(x) if (is.null(nm)) nm <- character() if (!is.character(i) && anyNA(nm)) { names(nm) <- names(x) <- seq_along(x) y <- NextMethod("[") cols <- names(y) if (anyNA(cols)) stop("undefined columns selected") cols <- names(y) <- nm[cols] } else { y <- NextMethod("[") cols <- names(y) if (!is.null(cols) && anyNA(cols)) stop("undefined columns selected") } if (anyDuplicated(cols)) names(y) <- make.unique(cols) attr(y, "row.names") <- .row_names_info(x, 0L) attr(y, "class") <- oldClass(x) return(y) } if (missing(i)) { if (drop && !has.j && length(x) == 1L) return(.subset2(x, 1L)) nm <- names(x) if (is.null(nm)) nm <- character() if (has.j && !is.character(j) && anyNA(nm)) { names(nm) <- names(x) <- seq_along(x) y <- .subset(x, j) cols <- names(y) if (anyNA(cols)) stop("undefined columns selected") cols <- names(y) <- nm[cols] } else { y <- if (has.j) .subset(x, j) else x cols <- names(y) if (anyNA(cols)) stop("undefined columns selected") } if (drop && length(y) == 1L) return(.subset2(y, 1L)) if (anyDuplicated(cols)) names(y) <- make.unique(cols) nrow <- .row_names_info(x, 2L) if (drop && !mdrop && nrow == 1L) return(structure(y, class = NULL, row.names = NULL)) else { attr(y, "class") <- oldClass(x) attr(y, "row.names") <- .row_names_info(x, 0L) return(y) } } xx <- x cols <- names(xx) x <- vector("list", length(x)) x <- .Internal(copyDFattr(xx, x)) oldClass(x) <- attr(x, "row.names") <- NULL if (has.j) { nm <- names(x) if (is.null(nm)) nm <- character() if (!is.character(j) && anyNA(nm)) names(nm) <- names(x) <- seq_along(x) x <- x[j] cols <- names(x) if (drop && length(x) == 1L) { if (is.character(i)) { rows <- attr(xx, "row.names") i <- pmatch(i, rows, duplicates.ok = TRUE) } xj <- .subset2(.subset(xx, j), 1L) return(if (length(dim(xj)) != 2L) xj[i] else xj[i, , drop = FALSE]) } if (anyNA(cols)) stop("undefined columns selected") if (!is.null(names(nm))) cols <- names(x) <- nm[cols] nxx <- structure(seq_along(xx), names = names(xx)) sxx <- match(nxx[j], seq_along(xx)) } else sxx <- seq_along(x) rows <- NULL if (is.character(i)) { rows <- attr(xx, "row.names") i <- pmatch(i, rows, duplicates.ok = TRUE) } for (j in seq_along(x)) { xj <- xx[[sxx[j]]] x[[j]] <- if (length(dim(xj)) != 2L) xj[i] else xj[i, , drop = FALSE] } if (drop) { n <- length(x) if (n == 1L) return(x[[1L]]) if (n > 1L) { xj <- x[[1L]] nrow <- if (length(dim(xj)) == 2L) dim(xj)[1L] else length(xj) drop <- !mdrop && nrow == 1L } else drop <- FALSE } if (!drop) { if (is.null(rows)) rows <- attr(xx, "row.names") rows <- rows[i] if ((ina <- anyNA(rows)) | (dup <- anyDuplicated(rows))) { if (!dup && is.character(rows)) dup <- "NA" %in% rows if (ina) rows[is.na(rows)] <- "NA" if (dup) rows <- make.unique(as.character(rows)) } if (has.j && anyDuplicated(nm <- names(x))) names(x) <- make.unique(nm) if (is.null(rows)) rows <- attr(xx, "row.names")[i] attr(x, "row.names") <- rows oldClass(x) <- oldClass(xx) } x } <bytecode: 0x7fe8cc3a5548> <environment: namespace:base>
The relevant bit happens here:
for (j in seq_along(x)) { xj <- xx[[sxx[j]]] x[[j]] <- if (length(dim(xj)) != 2L) xj[i] else xj[i, , drop = FALSE] }
At this point (in the
d[3,]
example for instance), we have this:> str(xx) 'data.frame': 3 obs. of 3 variables: $ date : POSIXlt, format: "2014-01-01" "2015-01-02" "2016-01-03" $ val : Factor w/ 3 levels "a","b","c": 1 2 3 $ date2000: POSIXlt, format: "2000-01-01" "2000-01-02" "2000-01-03" > str(x) List of 3 $ date : NULL $ val : NULL $ date2000: NULL > i [1] 3 > str(sxx) int [1:3] 1 2 3
For j=3 we have:
> str(xj) POSIXlt[1:3], format: "2000-01-01" "2000-01-02" "2000-01-03" > dim(xj) NULL > xj[3] [1] NA
So this is where it fails. I think the problem comes (as you noted) from the fact that you replaced
d$date2000$year
by 1 value instead of 3:> xj$wday [1] 3 5 0 > xj$year [1] 100 > xj[3] [1] NA > xj$year <- c(100,100,100) > xj[3] [1] "2000-01-03 CET"
It seems that when displaying
xj
(ord
), the value forxj$year
is recycled, but when displaying onlyxj[3]
it tries to build the POSIXlt and fails as it lacks ayear
element. And indeed if we try with two elements, instead of one or three, we can see the vector being recycled:> xj$year <- c(100,101) > xj [1] "2000-01-01 CET" "2001-01-02 CET" "2000-01-03 CET" > xj[2] [1] "2001-01-02 CET" > xj[3] [1] NA
这篇关于data.frame内的修改日期变为< NA>选择后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!