data.table join和j-expression意外行为 [英] data.table join and j-expression unexpected behavior

查看：100 发布时间：2017/3/12 10:29:59 r data.table

本文介绍了data.table join和j-expression意外行为的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 R 2.15.0 和 data.table 1.8.9 ：

  d = data.table（a = 1：5，value = 2：6，key =a）
 
d （3），value] 
＃a value 
＃3 4 
 
d [J（3）] [，value] 
＃4

我希望两个产生相同的输出（第二个），我相信他们应该。 / p>

为了清除这不是 J 语法问题，同样的期望适用于以下到上面的）表达式：

  t = data.table（a = 3，key =a）
d [t，value] 
d [t] [，value]

 因此，让我重新整理问题：为什么是（ data.table  c> d [t，value] ？
 $自动打印出的 b 
 $ b  更新（根据下面的答案和评论）：感谢@Arun等人，我明白设计 - 为什么现在。上述打印键的原因是因为每当你做 data.table 时，通过 X [Y] 语法合并，通过之所以这样设计的原因似乎是这样的 - 因为 by 操作必须在合并时执行，我们可以利用它，而不是另一个<$ 
 
 
 现在说，我相信这是一个语法设计缺陷。我读取 data.table 语法 d [i，j，by = b] 的方式是
 
 使用 d ，应用 i  （是子集合或合并或其他），然后通过b 
 
 
 
 $来执行 j  b $ b 
 by-without-by打破了这个阅读，介绍了我们必须特别考虑的情况（我在 i 上合并，是 by 只是合并的键，等等）。我相信这应该是 data.table 的工作 - 值得赞扬的努力，使一个更快的$  data.table 当通过等于键时，应以另一种方式进行合并（例如通过内部检查 by 表达式实际上是合并的关键。）
解决方案  //r-forge.r-project.org/scm/viewvc.php/pkg/NEWS?view=markup&root=datatablerel =nofollow>  data.table 1.9.3  ，则默认行为已更改，以下示例生成相同的结果。要获取 by-without-by 结果，现在必须指定一个显式 by = .EACHI ：
  d = data.table（a = 1：5，value = 2：6，key =a）
 
d （3），value] 
＃[1] 4 
 
d [J（3），value，by = .EACHI] 
＃a value 
＃ 3 4 
  
这里是一个稍微复杂的例子，说明了区别：
  d = data.table（a = 1：2，b = 1：6，key ='a'）
＃ab 
＃1：1 1 
＃2：1 3 
＃3：1 5 
＃4：2 2 
＃5：2 4 
＃6：2 6 
 
＃正常加入
d [J（c（1,2）），sum（b）] 
＃[1] 21 
 
＃加入一个by-without-by或by-each-i 
d [J（c（1,2）），sum（b），by = .EACHI] 
＃a V1 
＃1：1 9 
＃2：2 12 
 
＃和一个更复杂的例子：
d [J（c（1,2,1）），sum ），by = .EACHI] 
＃a V1 
＃1：1 9 
＃2：2 12 
＃3：1 9 
  
 
In R 2.15.0 and data.table 1.8.9:
d = data.table(a = 1:5, value = 2:6, key = "a")

d[J(3), value]
#   a value
#   3     4

d[J(3)][, value]
#   4
I expected both to produce the same output (the 2nd one) and I believe they should.


In the interest of clearing up that this is not a J syntax issue, same expectation applies to the following (identical to the above) expressions:
t = data.table(a = 3, key = "a")
d[t, value]
d[t][, value]
I would expect both of the above to return the exact same output.

So let me rephrase the question - why is (data.table designed so that) the key column printed out automatically in d[t, value]?

Update (based on answers and comments below): Thanks @Arun et al., I understand the design-why now. The reason the above prints the key is because there is a hidden by present every time you do a data.table merge via the X[Y] syntax, and that by is by the key. The reason it's designed this way seems to be the following - since the by operation has to be performed when merging, one might as well take advantage of that and not do another by if you are going to do that by the key of the merge.

Now that said, I believe that's a syntax design flaw. The way I read data.table syntax d[i, j, by = b] is 

  take d, apply the i operation (be that subsetting or merging or whatnot), and then do the j expression "by" b
The by-without-by breaks this reading and introduces cases one has to think about specifically (am I merging on i, is by just the key of the merge, etc). I believe this should be the job of the data.table - the commendable effort to make data.table faster in one particular case of the merge, when the by is equal to the key, should be done in an alternative way (e.g. by checking internally if the by expression is actually the key of the merge).
 解决方案 
As of data.table 1.9.3, the default behavior has been changed and the examples below produce the same result. To get the by-without-by result, one now has to specify an explicit by=.EACHI:
d = data.table(a = 1:5, value = 2:6, key = "a")

d[J(3), value]
#[1] 4

d[J(3), value, by = .EACHI]
#   a value
#1: 3     4
And here's a slightly more complicated example, illustrating the difference:
d = data.table(a = 1:2, b = 1:6, key = 'a')
#   a b
#1: 1 1
#2: 1 3
#3: 1 5
#4: 2 2
#5: 2 4
#6: 2 6

# normal join
d[J(c(1,2)), sum(b)]
#[1] 21

# join with a by-without-by, or by-each-i
d[J(c(1,2)), sum(b), by = .EACHI]
#   a V1
#1: 1  9
#2: 2 12

# and a more complicated example:
d[J(c(1,2,1)), sum(b), by = .EACHI]
#   a V1
#1: 1  9
#2: 2 12
#3: 1  9


                        
这篇关于data.table join和j-expression意外行为的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
                        
                    

                    
                        查看全文

data.table join和j-expression意外行为 [英] data.table join and j-expression unexpected behavior

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.table join和j-expression意外行为 [英] data.table join and j-expression unexpected behavior

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭