在不正确使用set之后data.table中的负数行 [英] Negative number of rows in data.table after incorrect use of set

查看:112
本文介绍了在不正确使用set之后data.table中的负数行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一些有点奇怪,特别是因为代码可能会在每次运行时给出不同的输出。简而言之,我不正确地使用 set 在大于最后一行的行中设置一个值,而不是什么都不做 set 创建了一个负长度 data.table

  .table)

dt< -data.table(id = 1:5,var = rnorm(5))#normal example

set(dt,6L,1L, 3L)#没有设置任何预期。
dt

#现在我的真实数据,我发现我的代码错误(不正确的行号设置)

dt1 < - 数据。表(ID =29502509,FY = 2012,VAR = 61067.5442975645,
startDate = structure(15062L,class = c(IDate,Date)),
endDate = structure(15429L, class = c(IDate,Date)),
start =1750,end =2404,
date = structure(15461L,class = c )),
DESCR =JOB,NOTE =NEW)

set(dt1,12L,3L,62385.6516144086)
str(dt1)
类'data.table'和'data.frame':1 obs。 of 10 variables:
$ ID:chr29502509
$ FY:num 2012
$ VAR:num 61068
$ startDate:IDate,format:2011-03-29
$ endDate:
do.call中的错误(str,c(list(object = obj),aList,list(...)),quote = TRUE):
不允许向量
> sapply(dt1,length)
ID FY VAR startDate endDate开始结束日期
1 1 1 1 -637110831 1 1 1
DESCR注
1 1
> dput(dt1)
结构(列表(ID =29502509,FY = 2012,VAR = 61067.5442975645,
startDate = structure(15062L,class = c(IDate,Date
endDate = structure(,class = c(IDate,Date)),start =1750,#HERE
end =2404,date = structure(15461L,class = c (IDate,
Date)),DESCR =JOB,NOTE =NEW),.Names = c(ID,
FY,VAR startDate,endDate,start,end,date,
DESCR,NOTE),row.names = c(NA,-1L),class = c .table,
data.frame),.internal.selfref =< pointer:0x0000000000130788>)


b $ b

正如我上面所说的,你可能需要运行一些时间的整个代码看到,从创建data.table dt1 < - data.table(... set(dt1,... ,因为我注意到,如果它不发生第一次它不会发生除非我重新运行 dt1 < - data.table(... 。任何想法?



编辑:



具体来说,当我说不同的结果,我的意思是有时它什么都不做(如预期),但大多数时候,它创建一个负长度列总是 日期,有时它创建一个带有负行的整个 data.table 加上,在最后两种情况下(单列或整个 data.table ),负长度总是 -637110831

解决方案

看起来像由于写入超出为列分配的内存而导致的内存损坏。

这会调用 assign.c 中的 assign 。从版本1.8.8,assign.c:434:

  434默认值:
435 for(r = 0; r 436 memcpy((char *)DATAPTR(targetcol)+(INTEGER(rows)[r] -1)* size,
437(char *)DATAPTR (r%vlen)* size,
438 size);

已达到此代码(应该不是这样)。此时:

 (gdb)p INTEGER(rows)[0] 
$ 21 = 12
(gdb)p size
$ 23 = 8


I've come across something a bit wierd, especially because the code may give different outputs each time it's run. In a nutshell I was incorrectly using set to set a value in a row bigger than the last one but instead of doing nothing set created a negative length data.table.

library(data.table)

dt<-data.table(id=1:5, var=rnorm(5)) # normal example

set(dt, 6L, 1L, 3L) # doesn't set anything as expected.
dt
#
# now my real data, after I found the error in my code (incorrect row number in set)
#
dt1 <- data.table(ID = "29502509", FY = 2012, VAR = 61067.5442975645, 
                      startDate = structure(15062L, class = c("IDate", "Date")), 
                      endDate = structure(15429L, class = c("IDate", "Date")), 
                      start = "1750", end = "2404",
                      date = structure(15461L,class = c("IDate", "Date")),
                      DESCR = "JOB", NOTE = "NEW")

set(dt1, 12L, 3L, 62385.6516144086)
str(dt1)
Classes ‘data.table’ and 'data.frame':  1 obs. of  10 variables:
 $ ID       : chr "29502509"
 $ FY       : num 2012
 $ VAR      : num 61068
 $ startDate: IDate, format: "2011-03-29"
 $ endDate  :
Error in do.call(str, c(list(object = obj), aList, list(...)), quote = TRUE) : 
  negative length vectors are not allowed
> sapply(dt1, length)
        ID         FY        VAR  startDate    endDate      start        end       date 
         1          1          1          1 -637110831          1          1          1 
     DESCR       NOTE 
         1          1 
> dput(dt1)
structure(list(ID = "29502509", FY = 2012, VAR = 61067.5442975645, 
    startDate = structure(15062L, class = c("IDate", "Date")), 
    endDate = structure(, class = c("IDate", "Date")), start = "1750", # HERE
    end = "2404", date = structure(15461L, class = c("IDate", 
    "Date")), DESCR = "JOB", NOTE = "NEW"), .Names = c("ID", 
"FY", "VAR", "startDate", "endDate", "start", "end", "date", 
"DESCR", "NOTE"), row.names = c(NA, -1L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000000000130788>)

As I said above you may need to run some times the entire code to see that, from the creation of the data.table dt1 <- data.table(... to set(dt1,..., because I noticed that if it doesn't happen the first time it won't ever happen unless I re-run dt1 <- data.table(... . Any idea?

EDIT:

To be specific, when I say different result I mean that sometimes it does nothing (as expected) but most of the times it creates a negative length column always the Date, and sometimes it creates an entire data.table with negative rows. Plus, in the last two cases (single column or entire data.table) the negative length is always -637110831

解决方案

Looks like memory corruption due to writing beyond the memory allocated for the column.

This calls to assign in assign.c. From version 1.8.8, assign.c:434:

434             default :
435                 for (r=0; r<targetlen; r++)
436                     memcpy((char *)DATAPTR(targetcol) + (INTEGER(rows)[r]-1)*size, 
437                            (char *)DATAPTR(RHS) + (r%vlen) * size,
438                            size);

This code is reached (which should not be the case). At this point:

(gdb) p INTEGER(rows)[0]
$21 = 12
(gdb) p size
$23 = 8

这篇关于在不正确使用set之后data.table中的负数行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆