在不正确使用set之后data.table中的负数行 [英] Negative number of rows in data.table after incorrect use of set
问题描述
我遇到了一些有点奇怪,特别是因为代码可能会在每次运行时给出不同的输出。简而言之,我不正确地使用 set
在大于最后一行的行中设置一个值,而不是什么都不做 set
创建了一个负长度 data.table
。
.table)
dt< -data.table(id = 1:5,var = rnorm(5))#normal example
set(dt,6L,1L, 3L)#没有设置任何预期。
dt
#
#现在我的真实数据,我发现我的代码错误(不正确的行号设置)
#
dt1 < - 数据。表(ID =29502509,FY = 2012,VAR = 61067.5442975645,
startDate = structure(15062L,class = c(IDate,Date)),
endDate = structure(15429L, class = c(IDate,Date)),
start =1750,end =2404,
date = structure(15461L,class = c )),
DESCR =JOB,NOTE =NEW)
set(dt1,12L,3L,62385.6516144086)
str(dt1)
类'data.table'和'data.frame':1 obs。 of 10 variables:
$ ID:chr29502509
$ FY:num 2012
$ VAR:num 61068
$ startDate:IDate,format:2011-03-29
$ endDate:
do.call中的错误(str,c(list(object = obj),aList,list(...)),quote = TRUE):
不允许向量
> sapply(dt1,length)
ID FY VAR startDate endDate开始结束日期
1 1 1 1 -637110831 1 1 1
DESCR注
1 1
> dput(dt1)
结构(列表(ID =29502509,FY = 2012,VAR = 61067.5442975645,
startDate = structure(15062L,class = c(IDate,Date
endDate = structure(,class = c(IDate,Date)),start =1750,#HERE
end =2404,date = structure(15461L,class = c (IDate,
Date)),DESCR =JOB,NOTE =NEW),.Names = c(ID,
FY,VAR startDate,endDate,start,end,date,
DESCR,NOTE),row.names = c(NA,-1L),class = c .table,
data.frame),.internal.selfref =< pointer:0x0000000000130788>)
b $ b
正如我上面所说的,你可能需要运行一些时间的整个代码看到,从创建data.table dt1 < - data.table(...
到 set(dt1,...
,因为我注意到,如果它不发生第一次它不会发生除非我重新运行 dt1 < - data.table(...
。任何想法?
编辑:
具体来说,当我说不同的结果,我的意思是有时它什么都不做(如预期),但大多数时候,它创建一个负长度列总是 日期
,有时它创建一个带有负行的整个 data.table
。 加上,在最后两种情况下(单列或整个 data.table
),负长度总是 -637110831
看起来像由于写入超出为列分配的内存而导致的内存损坏。
这会调用 assign.c
中的 assign
。从版本1.8.8,assign.c:434:
434默认值:
435 for(r = 0; r 436 memcpy((char *)DATAPTR(targetcol)+(INTEGER(rows)[r] -1)* size,
437(char *)DATAPTR (r%vlen)* size,
438 size);
已达到此代码(应该不是这样)。此时:
(gdb)p INTEGER(rows)[0]
$ 21 = 12
(gdb)p size
$ 23 = 8
I've come across something a bit wierd, especially because the code may give different outputs each time it's run. In a nutshell I was incorrectly using set
to set a value in a row bigger than the last one but instead of doing nothing set
created a negative length data.table
.
library(data.table)
dt<-data.table(id=1:5, var=rnorm(5)) # normal example
set(dt, 6L, 1L, 3L) # doesn't set anything as expected.
dt
#
# now my real data, after I found the error in my code (incorrect row number in set)
#
dt1 <- data.table(ID = "29502509", FY = 2012, VAR = 61067.5442975645,
startDate = structure(15062L, class = c("IDate", "Date")),
endDate = structure(15429L, class = c("IDate", "Date")),
start = "1750", end = "2404",
date = structure(15461L,class = c("IDate", "Date")),
DESCR = "JOB", NOTE = "NEW")
set(dt1, 12L, 3L, 62385.6516144086)
str(dt1)
Classes ‘data.table’ and 'data.frame': 1 obs. of 10 variables:
$ ID : chr "29502509"
$ FY : num 2012
$ VAR : num 61068
$ startDate: IDate, format: "2011-03-29"
$ endDate :
Error in do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) :
negative length vectors are not allowed
> sapply(dt1, length)
ID FY VAR startDate endDate start end date
1 1 1 1 -637110831 1 1 1
DESCR NOTE
1 1
> dput(dt1)
structure(list(ID = "29502509", FY = 2012, VAR = 61067.5442975645,
startDate = structure(15062L, class = c("IDate", "Date")),
endDate = structure(, class = c("IDate", "Date")), start = "1750", # HERE
end = "2404", date = structure(15461L, class = c("IDate",
"Date")), DESCR = "JOB", NOTE = "NEW"), .Names = c("ID",
"FY", "VAR", "startDate", "endDate", "start", "end", "date",
"DESCR", "NOTE"), row.names = c(NA, -1L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000000130788>)
As I said above you may need to run some times the entire code to see that, from the creation of the data.table dt1 <- data.table(...
to set(dt1,...
, because I noticed that if it doesn't happen the first time it won't ever happen unless I re-run dt1 <- data.table(...
. Any idea?
EDIT:
To be specific, when I say different result I mean that sometimes it does nothing (as expected) but most of the times it creates a negative length column always the Date
, and sometimes it creates an entire data.table
with negative rows. Plus, in the last two cases (single column or entire data.table
) the negative length is always -637110831
Looks like memory corruption due to writing beyond the memory allocated for the column.
This calls to assign
in assign.c
. From version 1.8.8, assign.c:434:
434 default :
435 for (r=0; r<targetlen; r++)
436 memcpy((char *)DATAPTR(targetcol) + (INTEGER(rows)[r]-1)*size,
437 (char *)DATAPTR(RHS) + (r%vlen) * size,
438 size);
This code is reached (which should not be the case). At this point:
(gdb) p INTEGER(rows)[0]
$21 = 12
(gdb) p size
$23 = 8
这篇关于在不正确使用set之后data.table中的负数行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!