H2O运行慢于数据表R [英] H2O running slower than data.table R

查看:59
本文介绍了H2O运行慢于数据表R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将数据存储到H2O矩阵中比在data.table中存储慢?

How it is possible that storing data into H2O matrix are slower than in data.table?

#Packages used "H2O" and "data.table"
library(h2o)
library(data.table)
#create the matrix
matrix1<-data.table(matrix(rnorm(1000*1000),ncol=1000,nrow=1000))
matrix2<-h2o.createFrame(1000,1000)

h2o.init(nthreads=-1)
#Data.table variable store
for(i in 1:1000){
matrix1[i,1]<-3
}
#H2O Matrix Frame store
for(i in 1:1000){
  matrix2[i,1]<-3
}

谢谢!

推荐答案

H2O是一种客户端/服务器体系结构。 (请参见 http://docs.h2o.ai/h2o/最新的稳定/h2o-docs/architecture.html

H2O is a client/server architecture. (See http://docs.h2o.ai/h2o/latest-stable/h2o-docs/architecture.html)

因此,您所显示的是在H2O内存中指定H2O帧的效率很低的方法。每次写入都将变成网络调用。

So what you've shown is a very inefficient way to specify an H2O frame in H2O memory. Every write is going to be turning into a network call. You almost certainly don't want this.

对于您的示例,由于数据不大,因此合理的做法是将初始分配给本地数据帧(或数据表),然后使用as.h2o()的push方法。

For your example, since the data isn't large, a reasonable thing to do would be to do the initial assignment to a local data frame (or datatable) and then use push method of as.h2o().

h2o_frame = as.h2o(matrix1)
head(h2o_frame)

这会将R数据帧从R客户端推送到H2O服务器内存中的H2O帧。 (并且您可以执行as.data.table()来执行相反的操作。)

This pushes an R data frame from the R client into an H2O frame in H2O server memory. (And you can do as.data.table() to do the opposite.)

data.table提示:

data.table Tips:

对于data.table,请使用就地:=语法。这样可以避免复制。因此,例如:

For data.table, prefer the in-place := syntax. This avoids copies. So, for example:

matrix1[i, 3 := 42]






H2O提示:


H2O Tips:

最快的方法通过使用h2o.importFile()中的pull方法将数据读取到H2O中来读取数据。这是并行且分布式的。

The fastest way to read data into H2O is by ingesting it using the pull method in h2o.importFile(). This is parallel and distributed.

上面显示的as.h2o()技巧适用于易于容纳一台主机内存的小型数据集。

The as.h2o() trick shown above works well for small datasets that easily fit in memory of one host.

如果要观看R和H2O之间的网络消息,请调用h2o.startLogging()。

If you want to watch the network messages between R and H2O, call h2o.startLogging().

这篇关于H2O运行慢于数据表R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆