与数据表的非联接 [英] non-joins with data.tables

查看：93 发布时间：2017/3/12 10:26:53 r data.table

本文介绍了与数据表的非联接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我对非联接的 data.table idiom有一个问题，源自Iterator的问题。这里有一个例子：

  library（data.table）
 
 dt1<  -  data.table （A1 = letters [1:10]，B1 = sample（1：5,10，replace = TRUE））
 dt2  
 setkey（dt1，A1）
 setkey（dt2，A2）

data.table 看起来像这样

 > dt1> dt2 
 A1 B1 A2 B2 
 [1，] a 1 [1，] a 2 
 [2，] b 4 [2，] b 5 
 [3，] c 2 [3，] c 2 
 [4，] d 5 [4，] d 1 
 [5，] e 1 [5，] e 1 
 [6，] f 2 [ 6，] k 5 
 [7，] g 3 [7，] l 2 
 [8，] h 3 [8，] m 4 
 [9，] i 2 [ ] n 1 
 [10，] j 4 [10，] o 1

dt2 中的哪些行在 dt1 中具有相同的键，请设置，其中选项 TRUE ：

  dt1 [dt2，which = TRUE] 
 [1] 1 2 3 4 5 NA NA NA NA NA

$ b b

Matthew在 [-dt1 [dt2，d3，d3，d3，d3，d3，d4]，则非加入成语

$ 7

/ which = TRUE]]

子集 dt1 指向那些没有出现在 dt2 中的索引的行。在我的机器上 data.table v1.7.1我收到一个错误：

 `[.default`（x [[s]]，irows）中的错误：只有0可能与负下标混合

而是使用 nomatch = 0 选项，非加入工作

 > dt1 [-dt1 [dt2，which = TRUE，nomatch = 0]] 
 A1 B1 
 [1，] f 2 
 [2，] g 3 
 [3， h 3 
 [4，] i 2 
 [5，] j 4

这是预期的行为吗？

解决方案

据我所知，这是基础R的一部分。

 ＃this works 
（1：4）[c（-2，-3）] 
 
＃但是这给了你上面描述的相同的错误
（1：4）[c（-2，-3，NA）] 
＃错误在（1：4）[c 3，NA）]：
＃只有0可以与负下标混合

这是我最好的猜测，为什么这是预期的行为：

从他们处理 NA 在别处的方式（例如通常默认为 na .rm = FALSE ），看起来R的设计师将 NA 的视为携带重要信息，并且没有一些明确的指令这样做。（幸运的是，设置 nomatch = 0 给你一个干净的方式来传递指令！）

，设计者的偏好可能解释为什么 NA '被接受用于正索引，而不是用于负索引：

 ＃正索引：因为返回值保留了关于NA的
（1：4）的信息[c（2,3，NA）] 
 
＃负索引：不工作，因为它不能轻易保留这样的信息
（1：4）[c（-2，-3，NA）]

I have a question on the data.table idiom for "non-joins", inspired from Iterator's question. Here is an example:

library(data.table)

dt1 <- data.table(A1=letters[1:10], B1=sample(1:5,10, replace=TRUE))
dt2 <- data.table(A2=letters[c(1:5, 11:15)], B2=sample(1:5,10, replace=TRUE))

setkey(dt1, A1)
setkey(dt2, A2)

The data.tables look like this

> dt1               > dt2
      A1 B1               A2 B2
 [1,]  a  1          [1,]  a  2
 [2,]  b  4          [2,]  b  5
 [3,]  c  2          [3,]  c  2
 [4,]  d  5          [4,]  d  1
 [5,]  e  1          [5,]  e  1
 [6,]  f  2          [6,]  k  5
 [7,]  g  3          [7,]  l  2
 [8,]  h  3          [8,]  m  4
 [9,]  i  2          [9,]  n  1
[10,]  j  4         [10,]  o  1

To find which rows in dt2 have the same key in dt1, set the which option to TRUE:

> dt1[dt2, which=TRUE]
[1]  1  2  3  4  5 NA NA NA NA NA

Matthew suggested in this answer, that a "non join" idiom

dt1[-dt1[dt2, which=TRUE]]

to subset dt1 to those rows that have indexes that don't appear in dt2. On my machine with data.table v1.7.1 I get an error:

Error in `[.default`(x[[s]], irows): only 0's may be mixed with negative subscripts

Instead, with the option nomatch=0, the "non join" works

> dt1[-dt1[dt2, which=TRUE, nomatch=0]]
     A1 B1
[1,]  f  2
[2,]  g  3
[3,]  h  3
[4,]  i  2
[5,]  j  4

Is this intended behavior?

解决方案

As far as I know, this is a part of base R.

# This works
(1:4)[c(-2,-3)]

# But this gives you the same error you described above
(1:4)[c(-2, -3, NA)]
# Error in (1:4)[c(-2, -3, NA)] : 
#   only 0's may be mixed with negative subscripts

The textual error message indicates that it is intended behavior.

Here's my best guess as to why that is the intended behavior:

From the way they treat NA's elsewhere (e.g. typically defaulting to na.rm=FALSE), it seems that R's designers view NA's as carrying important information, and are loath to drop that without some explicit instruction to do so. (Fortunately, setting nomatch=0 gives you a clean way to pass that instruction along!)

In this context, the designers' preference probably explains why NA's are accepted for positive indexing, but not for negative indexing:

# Positive indexing: works, because the return value retains info about NA's
(1:4)[c(2,3,NA)]

# Negative indexing: doesn't work, because it can't easily retain such info
(1:4)[c(-2,-3,NA)]

这篇关于与数据表的非联接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与数据表的非联接 [英] non-joins with data.tables

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

与数据表的非联接 [英] non-joins with data.tables

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭