警告消息:在 rbindlist(allargs) 中:强制引入的 NA:data.table 中可能存在错误? [英] Warning message: in rbindlist(allargs) : NAs introduced by coercion: possible bug in data.table?
问题描述
在分析一些数据时,我遇到了警告消息,我怀疑这是一个错误,因为这是一个非常简单的命令,我已经使用过很多次了.
While analysing some data, I came across the warning message, which I suspect to be a bug as it is a pretty straightforward command that I have worked with many times.
Warning message:
In rbindlist(allargs) : NAs introduced by coercion
我能够重现该错误.这是您应该能够重现错误的代码.
I was able to reproduce the error. Here's a code with which you should be able to reproduce the error.
# unique random names for column V1
set.seed(45)
n <- sapply(1:500, function(x) {
paste(sample(c(letters[1:26]), 10), collapse="")
})
# generate some values for V2 and V3
dt <- data.table(V1 = sample(n, 30*500, replace = TRUE),
V2 = sample(1:10, 30*500, replace = TRUE),
V3 = sample(50:100, 30*500, replace = TRUE))
setkey(dt, "V1")
# No warning when providing column names (and right results)
dt[, list(s = sum(V2), m = mean(V3)),by=V1]
# V1 s m
# 1: acgmqyuwpe 238 74.97778
# 2: adcltygwsq 204 79.94118
# 3: adftozibnh 165 75.51515
# 4: aeuowtlskr 164 75.70968
# 5: ahfoqclkpg 192 73.20000
# ---
# 496: zuqegoxkpi 93 77.95000
# 497: zwpserimgf 178 72.62963
# 498: zxkpdrlcsf 154 78.04167
# 499: zxvoaeflhq 121 75.34615
# 500: zyiwcsanlm 180 76.61290
# Warning message and results with NA
dt[, list(sum(V2), mean(V3)),by=V1]
# V1 V1 V2
# 1: acgmqyuwpe 238 74.97778
# 2: adcltygwsq 204 79.94118
# 3: adftozibnh 165 75.51515
# 4: aeuowtlskr 164 75.70968
# 5: ahfoqclkpg 192 73.20000
# ---
# 496: zuqegoxkpi NA 77.95000
# 497: zwpserimgf NA 72.62963
# 498: zxkpdrlcsf NA 78.04167
# 499: zxvoaeflhq NA 75.34615
# 500: zyiwcsanlm NA 76.61290
Warning message:
In rbindlist(allargs) : NAs introduced by coercion
1) 如果您不提供列名,似乎会发生这种情况.
1) It seems that this happens if you don't provide the column names.
2) 即便如此,特别是当
V1
(或您在by=
中使用的列)有很多unique代码> 条目(此处为 500)并且您没有指定列名,那么这似乎会发生.也就是说,当
by=
列V1
的唯一条目较少 时,这不会发生.例如,尝试将n
的代码从sapply(1:500, ...
更改为sapply(1:50, ...
并且您不会收到任何警告.2) Even then, in particular, when
V1
(or the column you use inby=
) has a lot ofunique
entries (500 here) and you don't specify column names, then this seems to happen. That is, this DOES NOT happen when theby=
columnV1
has fewer unique entries. For example, try changing the code forn
fromsapply(1:500, ...
tosapply(1:50, ...
and you'll get no warning.这里发生了什么?它在带有 OS X 10.8.2 的 Macbook pro 上的 R 版本 2.15(尽管我在另一个带有 2.15.2 的 macbook pro 上对其进行了测试).这是
sessionInfo()
.What's going on here? Its R version 2.15 on Macbook pro with OS X 10.8.2 (although I tested it on another macbook pro with 2.15.2). Here's the
sessionInfo()
.> sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.6 reshape2_1.2.2 loaded via a namespace (and not attached): [1] plyr_1.8 stringr_0.6.2 tools_2.15.0
刚刚用
2.15.2
转载:> sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.6
推荐答案
更新:现在由 Ricardo 在 v1.8.9 中修复
o rbind'ing data.tables 包含重复、"或 NA 列名现在工作,#2726 ॐ感谢 Garrett See 和 Arun Srinivasan用于报告.这也影响了 data.tables 的打印重复的列名,因为头部和尾部被 rbind-ed 在一起内部.
o rbind'ing data.tables containing duplicate, "" or NA column names now works, #2726 & #2384. Thanks to Garrett See and Arun Srinivasan for reporting. This also affected the printing of data.tables with duplicate column names since the head and tail are rbind-ed together internally.
<小时>
是的,错误.好像是在
data.table
s的print方法中重名.
Yes, bug. Seems to be in the print method of
data.table
s with duplicated names.ans = dt[, list(sum(V2), mean(V3)),by=V1] head(ans) V1 V1 V2 # notice the duplicated V1 1: acgmqyuwpe 140 78.07692 2: adcltygwsq 191 76.93333 3: adftozibnh 153 73.82143 4: aeuowtlskr 122 73.04348 5: ahfoqclkpg 143 75.83333 6: ahtczyuipw 135 73.54167 tail(ans) V1 V1 V2 1: zugrnehpmq 189 72.63889 2: zuqegoxkpi 150 76.03333 3: zwpserimgf 180 74.81818 4: zxkpdrlcsf 115 72.57895 5: zxvoaeflhq 157 76.53571 6: zyiwcsanlm 145 72.79167 print(ans) Error in rbindlist(allargs) : (converted from warning) NAs introduced by coercion rbind(head(ans),tail(ans)) Error in rbindlist(allargs) : (converted from warning) NAs introduced by coercion
作为一种解决方法,不要使用列名
V1
、V2
等创建 data.table.As a work around, don't create data.table with column names
V1
,V2
etc.这是由于这个已知的错误引起的:
It's arising due to this known bug :
我已经添加了一个链接回到这个问题.
and I've added a link there back to this question.
谢谢!
这篇关于警告消息:在 rbindlist(allargs) 中:强制引入的 NA:data.table 中可能存在错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!