`data.table`错误:“重新排序接收的不规则长度列表”在setkey [英] `data.table` error: "reorder received irregular lengthed list" in setkey

查看:173
本文介绍了`data.table`错误:“重新排序接收的不规则长度列表”在setkey的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个相当基本的 data.table ,有250k行和90列。我试图在字符的一个列上键入 data.table 。当我调用:

I have a fairly basic data.table in R, with 250k rows and 90 columns. I am trying to key the data.table on one of the columns which is of class character. When I call:

setkey(my.dt,my.column)

我收到以下隐藏的错误信息:

I receive the following cryptic error message:

"Error in setkeyv(x, cols, verbose=verbose) :
reorder received irregular lengthed list"

我发现了一个源代码提交这个消息​​,但不能完全解读它的意思。我的键列没有NA或空白值,看起来是完全合理的(它包含股票行情),并且与默认的 order()命令表现良好。

I have found a source-code commit with this message, but can't quite decipher what it means. My key column contains no NA or blank values, seems perfectly reasonable to look at (it contains stock tickers), and behaves well with the default order() command.

更令人沮丧的是,以下代码完全正确:

Even more frustrating, the following code completes correctly:

first.dt <- my.dt[1:100000]
setkey(first.dt,my.column)
second.dt <- my.dt[100001:nrow(my.dt]
setkey(second.dt,my.column)

我不知道会发生什么

编辑1:我已确认键中的每个值都符合相当标准的格式:

Edit 1: I have confirmed every value in the key fits a fairly standard format:

> length(grep("[A-Z]{3,4}\\.[A-Z]{2}",my.dt$my.column)) == nrow(my.dt)
[1] TRUE

编辑2: (注意,我实际上使用的是Windows 7)。我使用data.table版本1.8。

Edit 2: My system info is below (note that I'm actually using Windows 7). I am using data.table version 1.8.

> Sys.info()
          sysname           release           version          nodename           machine             login 
        "Windows" "Server 2008 x64"      "build 7600" "WIN-9RH28AH0CKG"          "x86-64"   "Administrator" 
             user    effective_user 
  "Administrator"   "Administrator" 


推荐答案

sapply(my.dt, length)

我怀疑一个或多个列的长度与第一列不同,并且这是一个无效的 data.table 。它不会是第一个5,因为你的 .Internal(inspect(my.dt))(感谢)显示那些,他们是确定。

I suspect that one or more columns have a different length to the first column, and that's an invalid data.table. It won't be one of the first 5 because your .Internal(inspect(my.dt)) (thanks) shows those and they're ok.

如果是这样,在v1.8.1中有这个错误修复:

If so, there is this bug fix in v1.8.1 :


o rbind的DT与不规则列表()现在循环使用列表项
正确,#2003。添加了测试。

o rbind() of DT with an irregular list() now recycles the list items correctly, #2003. Test added.

在之前的任何时候都有 rbind()创建 my.dt 和不规则长度的列表?如果没有,请通过运行 sapply(my.dt,length)的代码逐步查看创建无效扩展列的位置。武装,我们可以做一个工作,并修复潜在的错误。感谢。

Any chance there's an rbind() at an earlier point to create my.dt together with an irregular lengthed list? If not, please step through your code running the sapply(my.dt,length) to see where the invalidly lengthed column is being created. Armed with that we can make a work around and also fix the potential bug. Thanks.

编辑:

v1.8.1中的原始隐藏错误讯息现已改进,如下:

The original cryptic error message is now improved in v1.8.1, as follows :

DT = list(a=6:1,b=4:1)
setattr(DT,"class",c("data.table","data.frame"))
setkey(DT,a)

Error in setkeyv(x, cols, verbose = verbose) : 
  Column 2 is length 4 which differs from length of column 1 (6). Invalid
  data.table. Check NEWS link at top of ?data.table for latest bug fixes. If
  not already reported and fixed, please report to datatable-help.

注意:此方法创建一个 data.table 不推荐,因为它允许您创建一个无效的 data.table 。除非,你真的确定列表是常规的,你真的需要速度(即速度,你想避免检查 as.data。 table() data.table() do),或者你需要演示一个无效的 data.table ,因为我在这里。

NB: This method to create a data.table is not recommended because it lets you create an invalid data.table. Unless, you are really sure the list is regular and you really do need speed (i.e. for speed you want to avoid the checks that as.data.table() and data.table() do), or you need to demonstrate an invalid data.table, as I'm doing here.

这篇关于`data.table`错误:“重新排序接收的不规则长度列表”在setkey的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆