在数据表中显示进度的简单方式“循环 [英] Simple way of showing progress in data.table "by" loops
问题描述
我有一台缓慢的电脑,我的一些R计算需要几个小时,有时几天运行。我相信他们可以做得更有效率,但在此期间,我想找出一个简单的方法来显示R在多远的地方在做所需的计算。
在循环
中,可以通过 print(i)
。在执行 data.table
计算时是否类似?
例如,以下代码大约需要50小时在我的机器上运行
q [,ties:= sum(orig [pnum == origpat, pnum == ref.pat,inventors]),by = idx]
q
是一个 data.table
,列为 origpat
, ref .pat
和 idx
(索引)作为列。数据表 orig
和 ref
都包含 pnum
invent
。该代码简单地找出两个组中的重叠发明人的数量,但是给定迭代性质( by = idx
),需要很长时间。
我希望我的屏幕发布进度,例如。
< >解决方案
尝试
q [,ies:= {
print )
sum(orig [pnum == origpat,publisher]%in%ref [pnum == ref.pat,inventors])
},by = idx]
这类似于按组操作的 print(i)
p>
I have a slow computer and some of my R calculations take hours and sometimes days to run. I'm sure they can be made more efficient but in the meanwhile I would like to find out about a simple way to show how far along R is in doing the needed calculations.
In a loop
this can easily be done by print(i)
. Is something similar available when doing data.table
calculations ?
For instance, the following code takes about 50 hours to run on my machine
q[, ties := sum(orig[pnum == origpat, inventors] %in% ref[pnum == ref.pat, inventors]), by = idx]
q
is a data.table
with columns origpat
, ref.pat
and idx
(an index) as columns. The data tables orig
and ref
both contain columns pnum
and inventors
. The code simply finds the number of overlapping inventors in both groups but given the iterative nature (by = idx
), it takes a long time.
I'd like my screen to post progress, e.g. for every 1,000 rows (there are about 20 mio rows).
Any way to do this simply?
Try
q[, ies := {
print(.GRP)
sum(orig[pnum == origpat, inventors] %in% ref[pnum == ref.pat, inventors])
}, by=idx]
This is analogous to print(i)
for a by-group operation.
这篇关于在数据表中显示进度的简单方式“循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!