第一个括号分配是完全分配是耗时的? [英] First bracketed assignment is as time-consuming as full assignment?

查看:137
本文介绍了第一个括号分配是完全分配是耗时的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于这个答案:

p>我们可以看到,在第一次用'[< - ']改变向量时,R复制整个向量,即使只有一个条目修改。然而,在第二时间,矢量被适当地改变。如果我们测量创建和修改大量向量的时间,就不需要检查对象的地址,这是显而易见的:

  system.time(a <-rep(1L,10 ^ 8))
用户系统已过去
0.15 0.17 0.31
> system.time(a [222L] < - 111L)
用户系统已过
0.26 0.08 0.34
> system.time(a [333L] < - 111L)
用户系统已过
0 0 0

注意,类型/ storage.mode没有改变。



所以问题是:为什么不能优化第一个括号赋值以及?



编辑:(spoiler!)正如下面接受的答案中所解释的,这不是什么而是在 system.time 函数调用中包含第一个赋值的工件。这使得R将绑定到 a 的存储器空间标记为可能涉及多于一个符号,因此当改变时需要重复。



感谢Martin深入的解决方案!

解决方案

比较

的NAM()部分

 > a<  -  rep(1L,10)
> .Internal(inspect(a))
@ 457b840 13 INTSXP g0c4 [NAM(1)](len = 10,tl = 0)1,1,1,1,1,...

  ; system.time(a <-rep(1L,10))
[...]
> .Internal(inspect(a))
@ 4626f88 13 INTSXP g0c4 [NAM(2)](len = 10,tl = 0)1,1,1,1,1,...

第一个例子中的1表示R认为有一个引用 code>,因此可以更新到位。 2表示R认为至少有两个引用 a ,因此如果修改,则需要重复。大致来说,我将此合理化为 rep()内的系统时间外部 system.time ;道德等效 f = function(){x < - rep(1L,10); X }; a = f()而不是 g = function()rep(1L,10); a = g()



现实世界代码 ); a [123L] < - 231L 将不涉及副本。我们可以用

 >人工增加NAMED计数来计算作业时间。 a<  -  rep(1L,10 ^ 8)
> .internal(inspect(a))
@ 7f972b571010 13 INTSXP g0c7 [NAM(1)](len = 100000000,tl = 0)1,1,1,1,1,...
; system.time(a [123L] < - a [321L])
用户系统已过
0 0 0


Regarding this answer in: What exactly is copy-on-modify semantics in R, and where is the canonical source?

We can see that, at the first time a vector is altered with '[<-', R copies the entire vector even if only a single entry is to be modifed. At the second time, however, the vector is altered "in place". This is noticeable without inspecting the address of the objects if we measure the time to create and modify a large vector:

> system.time(a <- rep(1L, 10^8))
   user  system elapsed 
   0.15    0.17    0.31 
> system.time(a[222L] <- 111L)
   user  system elapsed 
   0.26    0.08    0.34 
> system.time(a[333L] <- 111L)
   user  system elapsed 
      0       0       0

Note that there is no change of type/storage.mode.

So the question is: why is it not possible to optimize the first bracket assignment as well? In what situation this kind of behaviour (full copy at first modification) is actually needed?

EDIT: (spoiler!) As explained in the accepted answer below, this is nothing but an artifact of enclosing the first assignment in a system.time function call. This causes R to mark the memory space bound to a as possibly referring to more than one symbol, thus requiring duplication when changed. If we remove the enclosing calls, the vector is modified in place from the very first bracket assignment.

Thanks Martin for in-depth solution!

解决方案

Compare the "NAM()" part of

> a <- rep(1L, 10)
> .Internal(inspect(a))
@457b840 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,1,1,1,1,...

versus

> system.time(a <- rep(1L, 10))
[...]
> .Internal(inspect(a))
@4626f88 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,1,1,1,1,...

The "1" in the first example means that R thinks there is one reference to a, hence can be updated in place. The "2" means that R thinks there have been at least two references to a, hence duplication required if modified. Roughly, I rationalize this as the representation of the return value of rep() inside system.time, and its value outside system.time; the moral equivalent of f = function() { x <- rep(1L, 10); x }; a = f() rather than g = function() rep(1L, 10); a = g().

The real-world code a <- rep(1L, 10^8); a[123L] <- 231L would not involve a copy. We can time the assignment without artificially incrementing the NAMED count with

> a <- rep(1L, 10^8)
> .Internal(inspect(a))
@7f972b571010 13 INTSXP g0c7 [NAM(1)] (len=100000000, tl=0) 1,1,1,1,1,...
> system.time(a[123L] <- a[321L])
   user  system elapsed 
      0       0       0 

这篇关于第一个括号分配是完全分配是耗时的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆