为什么expand.grid比data.table的CJ快? [英] Why is expand.grid faster than data.table 's CJ?
问题描述
> system.time(expand.grid(1:1000,1:10000))
user system elapsed
1.65 0.34 2.03
> system.time(CJ(1:1000,1:10000))
user system elapsed
3.48 0.32 3.79
推荐答案
感谢您回报这个问题。这已经在data.table 1.8.9中修复。下面是最新提交(913)的时间测试:
Thanks for reporting this. This has been fixed now in data.table 1.8.9. Here's the timing test with the latest commit (913):
system.time(expand.grid(1:1000,1:10000))
# user system elapsed
# 1.420 0.552 1.987
system.time(CJ(1:1000,1:10000))
# user system elapsed
# 0.080 0.092 0.171
从 NEWS :
CJ()在1e6行(例如)#4849上快90%。输入现在排序在组合之前,而不是在组合后,并使用rep.int而不是rep(感谢Sean Garborg的想法,代码和基准),只有排序,如果is.unsorted(),#2321。
CJ() is 90% faster on 1e6 rows (for example), #4849. The inputs are now sorted first before combining rather than after combining and uses rep.int instead of rep (thanks to Sean Garborg for the ideas, code and benchmark) and only sorted if is.unsorted(), #2321.
此外,查看新闻的其他值得注意的功能,以及错误修复;例如 CJ()
也获得一个新的 sorted
参数。
Also check out NEWS for other notable features that have made it in and bug fixes; e.g., CJ()
gains a new sorted
argument too.
这篇关于为什么expand.grid比data.table的CJ快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!