警告消息：在rbindlist（allargs）：强制引入的NAs：data.table中可能的错误？ [英] Warning message: in rbindlist(allargs) : NAs introduced by coercion: possible bug in data.table?

查看：338 发布时间：2017/3/12 10:30:42 r data.table

本文介绍了警告消息：在rbindlist（allargs）：强制引入的NAs：data.table中可能的错误？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在分析一些数据时，我发现了警告消息，我怀疑这是一个错误，因为它是一个很简单的命令，我已经工作过很多次。

While analysing some data, I came across the warning message, which I suspect to be a bug as it is a pretty straightforward command that I have worked with many times.

Warning message:
In rbindlist(allargs) : NAs introduced by coercion

我能够重现错误。

# unique random names for column V1
set.seed(45)
n <- sapply(1:500, function(x) {
    paste(sample(c(letters[1:26]), 10), collapse="")
})
# generate some values for V2 and V3
dt <- data.table(V1 = sample(n, 30*500, replace = TRUE), 
                 V2 = sample(1:10, 30*500, replace = TRUE), 
                 V3 = sample(50:100, 30*500, replace = TRUE))
setkey(dt, "V1")

# No warning when providing column names (and right results)
dt[, list(s = sum(V2), m = mean(V3)),by=V1]

#              V1   s        m
#   1: acgmqyuwpe 238 74.97778
#   2: adcltygwsq 204 79.94118
#   3: adftozibnh 165 75.51515
#   4: aeuowtlskr 164 75.70968
#   5: ahfoqclkpg 192 73.20000
#  ---                        
# 496: zuqegoxkpi  93 77.95000
# 497: zwpserimgf 178 72.62963
# 498: zxkpdrlcsf 154 78.04167
# 499: zxvoaeflhq 121 75.34615
# 500: zyiwcsanlm 180 76.61290

# Warning message and results with NA
dt[, list(sum(V2), mean(V3)),by=V1]

#              V1  V1       V2
#   1: acgmqyuwpe 238 74.97778
#   2: adcltygwsq 204 79.94118
#   3: adftozibnh 165 75.51515
#   4: aeuowtlskr 164 75.70968
#   5: ahfoqclkpg 192 73.20000
#  ---                        
# 496: zuqegoxkpi  NA 77.95000
# 497: zwpserimgf  NA 72.62963
# 498: zxkpdrlcsf  NA 78.04167
# 499: zxvoaeflhq  NA 75.34615
# 500: zyiwcsanlm  NA 76.61290

Warning message:
In rbindlist(allargs) : NAs introduced by coercion

1）
- 1) It seems that this happens if you don't provide the column names.
  
  2）即使这样，当 V1 你在 by = 中使用有很多独特的条目（此处为500），并且不指定列名，那么这似乎发生了。也就是说，当 by = 列 V1 有 较少的唯一条目。例如，尝试将 n 的代码从 sapply（1：500，... ）更改为 sapply（1:50，... ，您不会收到警告。
  
  2) Even then, in particular, when V1 (or the column you use in by=) has a lot of unique entries (500 here) and you don't specify column names, then this seems to happen. That is, this DOES NOT happen when the by= column V1 has fewer unique entries. For example, try changing the code for n from sapply(1:500, ... to sapply(1:50, ... and you'll get no warning.
  
  这里发生了什么？它的R版本2.15在Macbook pro与OS X 10.8.2（虽然我测试它在另一个macbook pro与2.15.2）。这是 sessionInfo（）。
  
  What's going on here? Its R version 2.15 on Macbook pro with OS X 10.8.2 (although I tested it on another macbook pro with 2.15.2). Here's the sessionInfo().
```
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.8.6 reshape2_1.2.2  

loaded via a namespace (and not attached):
[1] plyr_1.8      stringr_0.6.2 tools_2.15.0 
```
  只需复制 2.15.2 ：
```
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.8.6
```
  推荐答案
  
  更新：现在由Ricardo修正为v1.8.9
  
  .tables包含重复，或NA列名
  现在可以工作，＃2726& ＃2384。感谢Garrett See和Arun Srinivasan
  报告。这也影响了使用
  重复的列名称打印data.tables，因为头部和尾部在内部rbinded在一起
  。
  
  o rbind'ing data.tables containing duplicate, "" or NA column names now works, #2726 & #2384. Thanks to Garrett See and Arun Srinivasan for reporting. This also affected the printing of data.tables with duplicate column names since the head and tail are rbind-ed together internally.
  
  是的，错误。看起来是在具有重复名称的 data.table 的打印方法中。
```
ans = dt[, list(sum(V2), mean(V3)),by=V1]
head(ans)
           V1  V1       V2     # notice the duplicated V1
1: acgmqyuwpe 140 78.07692
2: adcltygwsq 191 76.93333
3: adftozibnh 153 73.82143
4: aeuowtlskr 122 73.04348
5: ahfoqclkpg 143 75.83333
6: ahtczyuipw 135 73.54167
tail(ans)
           V1  V1       V2
1: zugrnehpmq 189 72.63889
2: zuqegoxkpi 150 76.03333
3: zwpserimgf 180 74.81818
4: zxkpdrlcsf 115 72.57895
5: zxvoaeflhq 157 76.53571
6: zyiwcsanlm 145 72.79167
print(ans)
Error in rbindlist(allargs) : 
    (converted from warning) NAs introduced by coercion
rbind(head(ans),tail(ans))
Error in rbindlist(allargs) : 
    (converted from warning) NAs introduced by coercion
```
  要解决这个问题，不要使用列名 V1 code> V2 等。
  
  As a work around, don't create data.table with column names V1, V2 etc.
  
  这是由于这个已知的错误：
  
  It's arising due to this known bug :
  
  ＃2384包含重复列名的表格的rbind不正确绑定
  
  我已在其中添加了此问题的链接。
  
  and I've added a link there back to this question.
  
  谢谢！
  
  这篇关于警告消息：在rbindlist（allargs）：强制引入的NAs：data.table中可能的错误？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

警告消息：在rbindlist（allargs）：强制引入的NAs：data.table中可能的错误？ [英] Warning message: in rbindlist(allargs) : NAs introduced by coercion: possible bug in data.table?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

警告消息：在rbindlist（allargs）：强制引入的NAs：data.table中可能的错误？ [英] Warning message: in rbindlist(allargs) : NAs introduced by coercion: possible bug in data.table?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭