有没有有效的方法将Pandas DataFrame转换为H2O Frame? [英] Is there efficient way to convert Pandas DataFrame to H2O Frame?

查看:381
本文介绍了有没有有效的方法将Pandas DataFrame转换为H2O Frame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas数据框,我需要将其转换为H2O框.我使用以下代码-

I have a Pandas data frame and I need to convert it to H2O frame. I use the following code-

代码:

# Convert pandas dataframe to H2O frame
start_time = time.time()
input_data_matrix = h2o.H2OFrame(input_df)
logger.debug("3. Time taken to convert H2O Frame- " + str(time.time() - start_time))

输出:

2019-02-05 04:38:55,238记录器调试3.转换H2O所需的时间 框架9320.119945764542

2019-02-05 04:38:55,238 logger DEBUG 3. Time taken to convert H2O Frame- 9320.119945764542

数据帧(即input_df)大小为183K x 435,没有空值或NaN值.

The data frame (i.e. input_df) size 183K x 435 with no null or NaN values.

大约需要2个小时.有没有更好的方法来执行此操作?

It is taking around 2 hours. Is there any better way to perform this operation?

推荐答案

  1. 将熊猫数据框保存到csv文件中. (当然,如果您最初是从csv文件加载的,请跳过此步骤,当然,还没有对它进行任何数据处理.)

  1. Save the pandas data frame to a csv file. (Skip this step if you loaded it from a csv file in the first place, and haven't done any data munging on it, of course.)

将该csv文件放置在h2o服务器可以看到的位置. (如果您正在同一台计算机上运行客户端和服务器,则已经是这种情况.)

Put that csv file somewhere the h2o server can see it. (If you are running client and server on the same machine, this is already the case.)

使用h2o.import_file()(优先于h2o.upload_file()h2o.H2OFrame())

h2o.import_file()是将数据获取到H2O的最快方法,但是文件必须对服务器可见.在处理远程集群时,这可能意味着将其上传到该服务器文件系统,或将其放置在Web服务器,HDFS集群或AWS S3等上.

The h2o.import_file() is the quickest way to get data into H2O, but the file must be visible by the server. When dealing with a remote cluster, this might mean uploading it to that servers file system, or putting it on a web server, or an HDFS cluster, or on AWS S3, etc, etc.

(h2o.upload_file()较慢的原因是它将对客户端到服务器的数据执行HTTP POST,而h2o.H2OFrame()则较慢,因为它将熊猫数据导出到临时csv文件,然后使用,然后删除临时文件.)

(The reason h2o.upload_file() is slower is that it will do an HTTP POST of the data, from client to server, and h2o.H2OFrame() is slower because it exports your pandas data to a temp csv file, then uses h2o.upload_file(), then deletes the temp file afterwards.)

这篇关于有没有有效的方法将Pandas DataFrame转换为H2O Frame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆