如何避免在data.table中的优化警告 [英] how to avoid an optimization warning in data.table

查看：80 发布时间：2017/3/12 11:03:03 r data.table

本文介绍了如何避免在data.table中的优化警告的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下代码：

 > dt = data.table（a = c（rep（3,5），rep（4,5）），b = 1:10，c = 11:20，d = 21:30， ）
> dt 
abcd 
 1：3 1 11 21 
 2：3 2 12 22 
 3：3 3 13 23 
 4：3 4 14 24 
 5 ：3 5 15 25 
 6：4 6 16 26 
 7：4 7 17 27 
 8：4 8 18 28 
 9：4 9 19 29 
 10 ：4 10 20 30 
> dt [，lapply（.SD，sum），by =a] 
查找组（bysameorder = TRUE）...在0秒内完成。 bysameorder = TRUE and o__ is length 0 
优化j从'lapply（.SD，sum）'到'list（sum（b），sum（c），sum（d））'
 ...完成组0秒
abcd 
 1：3 15 65 115 
 2：4 40 90 140 
> dt [，c（count = .N，lapply（.SD，sum）），by =a] 
查找组（bysameorder = TRUE）...在0秒内完成。 bysameorder = TRUE and o__ is length 0 
优化开启，但j保持不变为'c（count = .N，lapply（.SD，sum））'
开始组群...是一个命名列表。为每个组一遍又一遍地创建相同的名称是非常低效的。当j = list（...）时，为了提高效率，在分组完成后检测，删除和回退任何名称。例如，使用j = transform（）防止加速（考虑更改为：=）。此邮件可能会在将来升级为警告。 
完成狗群在0秒
a计数bcd 
 1：3 5 15 65 115 
 2：4 5 40 90 140

如何避免可怕的非常低效的警告？

我可以添加 count 列：

  dt $ count<  -  1 
> dt 
abcd count 
 1：3 1 11 21 1 
 2：3 2 12 22 1 
 3：3 3 13 23 1 
 4：3 4 14 24 1 
 5：3 5 15 25 1 
 6：4 6 16 26 1 
 7：4 7 17 27 1 
 8：4 8 18 28 1 
 9： 4 9 19 29 1 
 10：4 10 20 30 1 
> dt [，lapply（.SD，sum），by =a] 
查找组（bysameorder = TRUE）...在0秒内完成。 bysameorder = TRUE and o__ is length 0 
优化j从'lapply（.SD，sum）'到'list（sum（b），sum（c），sum（d），sum（count））'
开始组合...在0秒内完成组合
abcd计数
 1：3 15 65 115 5 
 2：4 40 90 140 5 
  
 
 
 但这看起来不太优雅... 
解决方案
我可以想到的一种方法是通过引用分配 count ：
  dt.out < -  dt [，lapply（.SD，sum），by = a] 
 dt.out [，count：= dt [，.N，by = a] [，N ]] 
＃alternate：count：= table（dt $ a）
 
＃abcd count 
＃1：3 15 65 115 5 
＃2：4 40 90 140 5 
  
 
 
 
 
 
  编辑1： / strong>我仍然认为它只是消息，而不是警告。但是如果你仍然想避免这种情况，只需：
  dt.out [，count：= as.numeric ，.N，by = a] [，N]）] 
  
 
 
 
 
 $ b b  编辑2：非常有趣。相当于多个：= ：不会产生相同的邮件。
  dt.out [，`：=`（count = dt [，.N，by = a] [，N]）] 
＃检测j使用这些列：a 
＃查找组（bysameorder = TRUE）...在0.001秒内完成。 bysameorder = TRUE and o__ is length 0 
＃检测到j使用这些列：< none> 
＃优化开启，但j保持不变为'.N'
＃开始dogroups ...完成dogroups在0秒
＃检测到j使用这些列：N 
＃分配到所有2行
＃直接plonk未命名的RHS，没有副本。 
  
 
I have the following code:
> dt <- data.table(a=c(rep(3,5),rep(4,5)),b=1:10,c=11:20,d=21:30,key="a")
> dt
    a  b  c  d
 1: 3  1 11 21
 2: 3  2 12 22
 3: 3  3 13 23
 4: 3  4 14 24
 5: 3  5 15 25
 6: 4  6 16 26
 7: 4  7 17 27
 8: 4  8 18 28
 9: 4  9 19 29
10: 4 10 20 30
> dt[,lapply(.SD,sum),by="a"]
Finding groups (bysameorder=TRUE) ... done in 0secs. bysameorder=TRUE and o__ is length 0
Optimized j from 'lapply(.SD, sum)' to 'list(sum(b), sum(c), sum(d))'
Starting dogroups ... done dogroups in 0 secs
   a  b  c   d
1: 3 15 65 115
2: 4 40 90 140
> dt[,c(count=.N,lapply(.SD,sum)),by="a"]
Finding groups (bysameorder=TRUE) ... done in 0secs. bysameorder=TRUE and o__ is length 0
Optimization is on but j left unchanged as 'c(count = .N, lapply(.SD, sum))'
Starting dogroups ... The result of j is a named list. It's very inefficient to create the same names over and over again for each group. When j=list(...), any names are detected, removed and put back after grouping has completed, for efficiency. Using j=transform(), for example, prevents that speedup (consider changing to :=). This message may be upgraded to warning in future.
done dogroups in 0 secs
   a count  b  c   d
1: 3     5 15 65 115
2: 4     5 40 90 140
How do I avoid the scary "very inefficient" warning?

I can add the count column before the join:
> dt$count <- 1
> dt
    a  b  c  d count
 1: 3  1 11 21     1
 2: 3  2 12 22     1
 3: 3  3 13 23     1
 4: 3  4 14 24     1
 5: 3  5 15 25     1
 6: 4  6 16 26     1
 7: 4  7 17 27     1
 8: 4  8 18 28     1
 9: 4  9 19 29     1
10: 4 10 20 30     1
> dt[,lapply(.SD,sum),by="a"]
Finding groups (bysameorder=TRUE) ... done in 0secs. bysameorder=TRUE and o__ is length 0
Optimized j from 'lapply(.SD, sum)' to 'list(sum(b), sum(c), sum(d), sum(count))'
Starting dogroups ... done dogroups in 0 secs
   a  b  c   d count
1: 3 15 65 115     5
2: 4 40 90 140     5
but this does not look too elegant...
 解决方案 
One way I could think of is to assign count by reference:
dt.out <- dt[, lapply(.SD,sum), by = a]
dt.out[, count := dt[, .N, by=a][, N]]
# alternatively: count := table(dt$a)

#    a  b  c   d count
# 1: 3 15 65 115     5
# 2: 4 40 90 140     5




Edit 1: I still think it's just message and not a warning. But if you still want to avoid that, just do:
dt.out[, count := as.numeric(dt[, .N, by=a][, N])]




Edit 2: Very interesting. Doing the equivalent of multiple := assignment does not produce the same message.
dt.out[, `:=`(count = dt[, .N, by=a][, N])]
# Detected that j uses these columns: a 
# Finding groups (bysameorder=TRUE) ... done in 0.001secs. bysameorder=TRUE and o__ is length 0
# Detected that j uses these columns: <none> 
# Optimization is on but j left unchanged as '.N'
# Starting dogroups ... done dogroups in 0 secs
# Detected that j uses these columns: N 
# Assigning to all 2 rows
# Direct plonk of unnamed RHS, no copy.


                        
这篇关于如何避免在data.table中的优化警告的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何避免在data.table中的优化警告 [英] how to avoid an optimization warning in data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何避免在data.table中的优化警告 [英] how to avoid an optimization warning in data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭