在数据表中显示进度的简单方式“循环 [英] Simple way of showing progress in data.table "by" loops

查看:143
本文介绍了在数据表中显示进度的简单方式“循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一台缓慢的电脑,我的一些R计算需要几个小时,有时几天运行。我相信他们可以做得更有效率,但在此期间,我想找出一个简单的方法来显示R在多远的地方在做所需的计算。



循环中,可以通过 print(i)。在执行 data.table 计算时是否类似?



例如,以下代码大约需要50小时在我的机器上运行

  q [,ties:= sum(orig [pnum == origpat, pnum == ref.pat,inventors]),by = idx] 

q 是一个 data.table ,列为 origpat ref .pat idx (索引)作为列。数据表 orig ref 都包含 pnum invent 。该代码简单地找出两个组中的重叠发明人的数量,但是给定迭代性质( by = idx ),需要很长时间。
我希望我的屏幕发布进度,例如。



< >解决方案

尝试

  q [,ies:= {
print )
sum(orig [pnum == origpat,publisher]%in%ref [pnum == ref.pat,inventors])
},by = idx]

这类似于按组操作的 print(i) p>

I have a slow computer and some of my R calculations take hours and sometimes days to run. I'm sure they can be made more efficient but in the meanwhile I would like to find out about a simple way to show how far along R is in doing the needed calculations.

In a loop this can easily be done by print(i). Is something similar available when doing data.table calculations ?

For instance, the following code takes about 50 hours to run on my machine

q[, ties := sum(orig[pnum == origpat, inventors] %in% ref[pnum == ref.pat, inventors]), by = idx]

q is a data.table with columns origpat, ref.pat and idx (an index) as columns. The data tables orig and ref both contain columns pnum and inventors. The code simply finds the number of overlapping inventors in both groups but given the iterative nature (by = idx), it takes a long time. I'd like my screen to post progress, e.g. for every 1,000 rows (there are about 20 mio rows).

Any way to do this simply?

解决方案

Try

q[, ies := {
  print(.GRP)
  sum(orig[pnum == origpat, inventors] %in% ref[pnum == ref.pat, inventors])
}, by=idx] 

This is analogous to print(i) for a by-group operation.

这篇关于在数据表中显示进度的简单方式“循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆