data.table中的列类的限制是什么? [英] What are the restrictions for the column classes in data.table?

查看:132
本文介绍了data.table中的列类的限制是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

UPDATE 此问题与 data.table 版本1.8.0及更高版本无关。从NEWS文件:

UPDATE This problem is not relevant anymore for data.table versions 1.8.0 and higher. From the NEWS file:


字符列现在允许在键中,并且优先于
因子。 data.table()和setkey()不再强制字符到
因子。仍然支持因素。实施FR#1493,FR#1224
和(部分)FR#951。

character columns are now allowed in keys and are preferred to factor. data.table() and setkey() no longer coerce character to factor. Factors are still supported. Implements FR#1493, FR#1224 and (partially) FR#951.

strong>

Original question

我尝试加入两个data.tables。但是,连接的成功取决于我用来匹配data.tables的列的类。更确切地说,似乎列不应该有类字符。我不太明白的原因,但我相信我错过了一些明显的这里。所以帮助真的很感激。

I try to join two data.tables. However, the success of the join is dependent on the classes of the columns I use to match the data.tables. More precisely, it seems that the columns should not have the class "character". I don't quite understand the reason, but I'm sure I'm missing something obvious here. So help is really appreciated.

以下是一个示例:

#Objective: Select all rows from DT for which Region=="US", Year >= 5 & Year<=8, Cat="A"                 
library(data.table)
#Set-up data.table DT
DT <- data.table(Year=1:20, value=rnorm(20), Region=c(rep("US", 10), rep("EU", 10)),     Cat=c(rep("A", 7), rep("B", 7), rep("C", 6)))
setkey(DT, Region, Cat, Year)
#Set-up data.table int_DT to join with DT
years   <- 5:8
df      <- data.frame(Region=c("US", "EU"), Categ=c("A", "B"))
int_DT <- J(cbind(df[1, ], years))
#Join them: Works like a charm!
DT[int_DT]

#Let's assume that for any reason the columns in df are of class "character"
df$Region <- as.character(df$Region)
df$Categ  <- as.character(df$Categ)
#Rebuild int_DT
int_DT    <- J(cbind(df[1, ], years))
DT[int_DT]    
#Error in `[.data.table`(DT, int_DT) : 
#  unsorted column Region of i is not internally type integer.

#OK, maybe the problem is that the column classes in DT are factors, so change those:
DT[, Cat:=as.character(Cat)]
DT[, Region:=as.character(Region)]

DT[int_DT]
#Error in `[.data.table`(DT, int_DT) : 
#  When i is a data.table, x must be sorted to avoid a vector scan of x per row of i

仍然不工作。为什么?什么是限制?我错过什么?附加信息:我在平台上使用data.table 1.6.6和R版本2.13.2(2011-09-30):x86_64-pc-linux-gnu(64位)。

Still doesn't work. Why? What is the restriction? What do I miss? Additionally information: I'm using data.table 1.6.6 and R version 2.13.2 (2011-09-30) on Platform: x86_64-pc-linux-gnu (64-bit).

推荐答案

您不需要连接操作来获取所需的结果。你说:
'目标:从DT中选择Region ==US,Year> = 5& Year <= 8,Cat =A'

You don't need a join operation to get your desired results. You said: 'Objective: Select all rows from DT for which Region=="US", Year >= 5 & Year<=8, Cat="A"'

DT[Region=="US" & Year>=5 & Year <= 8 & Categ=="A"]
     Year       value Region Categ
[1,]    5 -0.18631697     US     A
[2,]    6  1.40059083     US     A
[3,]    7  0.01848557     US     A






但是回答你关于列类的问题。我设法让这段代码工作,这基本上反映你的代码上面:


But to answer your question about column classes. I managed to get this code to work, which essentially mirrors your code above:

> setkey(DT, Region, Categ, Year)
> df      <- data.frame(Region=c("US", "EU"), Categ=c("A", "B"))
> dt2 <- data.table(data.frame(df[1, ], Year=5:8))
Warning message:
In data.frame(df[1, ], Year = 5:8) :
  row names were found from a short variable and have been discarded
> dt1[dt2]
     Region Categ Year      value
[1,]     US     A    5 -0.5565422
[2,]     US     A    6 -0.1805841
[3,]     US     A    7  1.4474403
[4,]     US     A    8         NA






同样,字符的列类:

df$Region <- as.character(df$Region)
df$Categ  <- as.character(df$Categ)
#Rebuild int_DT
dt2    <- J(cbind(df[1, ], Year=5:8))

Warning message:
In data.frame(..., check.names = FALSE) :
  row names were found from a short variable and have been discarded

setkey(dt2, Region)
dt1[dt2]
   Region Year       value Categ Categ.1 Year.1
       US    1  1.20152558     A       A      5
       US    2  1.89391079     A       A      5
       US    3 -1.76022634     A       A      5
       US    4  0.92454680     A       A      5
       US    5 -0.55654217     A       A      5
       ...
       snip 
       ...
       US    9  0.67936243     B       A      8
       US   10 -0.09355764     B       A      8

这篇关于data.table中的列类的限制是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆