在h2o中加载大于内存大小的数据 [英] Loading data bigger than the memory size in h2o

查看:182
本文介绍了在h2o中加载大于内存大小的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试验加载大于h2o内存大小的数据。

H2o blog 提及:关于Bigger Data和GC的说明:我们在Java堆中执行用户模式交换磁盘变得太满了,也就是说,你使用的是比物理DRAM更多的大数据。我们不会因GC死亡螺旋而死亡,但我们将退化到核心速度。我们会尽可能快地进入磁盘。我亲自测试了将12Gb数据集加载到2Gb(32位)JVM中;花费大约5分钟来加载数据,而另外5分钟来运行Logistic回归。



这是 R 代码连接到 h2o 3.6.0.8

  h2o.init(max_mem_size ='60m')#为h2o分配60MB,R在8GB RAM机器上运行

给出

  java版本1.8.0_65
Java™SE运行时环境(build 1.8.0_65-b17)
Java HotSpot™64位服务器虚拟机(版本25.65-b01,混合模式)

。成功连接到http://127.0.0.1 :54321 /

R连接到H2O集群:
H2O集群正常运行时间:2秒561毫秒
H2O集群版本:3.6.0.8
集群名称: H2O_started_from_R_RILITS-HWLTP_tkn816
H2O簇总节点数:1
H2O簇总内存数:0.06 GB
H2O簇总数:4
H2O簇允许内核数:2
H2O簇健康状态:TRUE

注意:开始时,H2O仅限于2个CPU的CRAN默认值。
如下所示关闭并重新启动H2O以使用您的所有CPU。
> h2o.shutdown()
> h2o.init(nthreads = -1)

IP地址:127.0.0.1
端口:54321
会话ID:_sid_b2e0af0f0c62cd64a8fcdee65b244d75
密钥数量:3

我试图将一个169 MB的csv加载到h2o中。

 dat.hex<  -  h2o.importFile('dat.csv')

抛出一个错误,

  .h2o .__ checkConnectionHealth()中的错误:
H2O连接有被切断。无法连接到http://127.0.0.1:54321/
上的实例无法连接到127.0.0.1端口54321:连接被拒绝

表示内存不足错误


问题:如果H2o承诺加载大于其内存容量的数据集(如上述博客报价所示,交换到磁盘机制),加载数据的正确方法是什么?



解决方案

交换到磁盘被禁用默认情况下,因为性能非常糟糕。出血性边缘(不是最新的stable)有一个标志来启用它:--cleaner(用于内存清理)。

请注意,您的群集有一个极小的内存:
H2O集群总内存:0.06 GB
这就是60MB!几乎不足以启动JVM,更不用说运行H2O。如果H2O能够在那里正常出现,我会感到惊讶,不必介意交换到磁盘。交换仅限于交换数据。如果您正在尝试进行交换测试,请将JVM调整为1或2 Gig ram,然后加载超过此数值的数据集。


Cliff p>

I am experimenting with loading data bigger than the memory size in h2o.

H2o blog mentions: A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, i.e., you’re using more Big Data than physical DRAM. We won’t die with a GC death-spiral, but we will degrade to out-of-core speeds. We’ll go as fast as the disk will allow. I’ve personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to run a Logistic Regression.

Here is the R code to connect to h2o 3.6.0.8:

h2o.init(max_mem_size = '60m') # alloting 60mb for h2o, R is running on 8GB RAM machine

gives

java version "1.8.0_65"
Java(TM) SE Runtime Environment (build 1.8.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)

.Successfully connected to http://127.0.0.1:54321/ 

R is connected to the H2O cluster: 
    H2O cluster uptime:         2 seconds 561 milliseconds 
    H2O cluster version:        3.6.0.8 
    H2O cluster name:           H2O_started_from_R_RILITS-HWLTP_tkn816 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   0.06 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  2 
    H2O cluster healthy:        TRUE 

Note:  As started, H2O is limited to the CRAN default of 2 CPUs.
       Shut down and restart H2O as shown below to use all your CPUs.
           > h2o.shutdown()
           > h2o.init(nthreads = -1)

IP Address: 127.0.0.1 
Port      : 54321 
Session ID: _sid_b2e0af0f0c62cd64a8fcdee65b244d75 
Key Count : 3

I tried to load a 169 MB csv into h2o.

dat.hex <- h2o.importFile('dat.csv')

which threw an error,

Error in .h2o.__checkConnectionHealth() : 
  H2O connection has been severed. Cannot connect to instance at http://127.0.0.1:54321/
Failed to connect to 127.0.0.1 port 54321: Connection refused

which is indicative of out of memory error.

Question: If H2o promises loading a data set larger than its memory capacity(swap to disk mechanism as the blog quote above says), is this the correct way to load the data?

解决方案

Swap-to-disk was disabled by default awhile ago, because performance was so bad. The bleeding-edge (not latest stable) has a flag to enable it: "--cleaner" (for "memory cleaner").
Note that your cluster has an EXTREMELY tiny memory: H2O cluster total memory: 0.06 GB That's 60MB! Barely enough to start a JVM with, much less run H2O. I would be surprised if H2O could come up properly there at all, never mind the swap-to-disk. Swapping is limited to swapping the data alone. If you're trying to do a swap-test, up your JVM to 1 or 2 Gigs ram, and then load datasets that sum more than that.

Cliff

这篇关于在h2o中加载大于内存大小的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆