如何在R中的特征哈希矩阵上使用H2o [英] How to use H2o on feature hashed matrix in R

查看:85
本文介绍了如何在R中的特征哈希矩阵上使用H2o的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个中等的数据集(train_data).有更多的124个变量和50000个观察值.对于分类变量,我已经通过R中的hashed.model.matrix函数对其进行了特征哈希处理.

I am working on a moderate data set (train_data). There are more 124 variables and 50,00,000 observations. For categorical variables, I have used feature hashing on it through hashed.model.matrix function in R.  

## feature hashing
b <- 2 ^ 22
f <- ~ .-1
X_train <- hashed.model.matrix(f, train_data, hash.size=b)

因此,结果,我得到了一个大的dgCmatrix(稀疏矩阵)作为输出(X_train).如何在此矩阵上使用H2o包装器并使用H2o中可用的不同算法? H2o包装器是否采用稀疏矩阵(dgCmatrix).此类用法的任何链接/示例都将有所帮助.谢谢您的期待.

So, as a result , I have got a large dgCmatrix (a sparse matrix) as output (X_train). How can I use, H2o wrapper  on  this matrix and use different algorithms available in H2o ? Does H2o wrapper take sparse matrix (dgCmatrix). Any link / example of such usage will be helpful. Thanks in anticipation.

期待在H2o环境中导入X_train来进行步骤类型的简化

# initialize connection to H2O server
  h2o.init(nthreads = -1)
 train.hex <- h2o.uploadFile('./X_train', destination_frame='train')

# list of features for training
feature.names <- names(train.hex)

# train random forest model, use ntrees = 500 
drf <- h2o.randomForest(x=feature.names, y='outcome', training_frame,train.hex, ntrees =500)

推荐答案

您可以将稀疏矩阵保存为svmlight稀疏格式,然后使用

you could save your sparse matrix to svmlight sparse format, then use

train.hex <- h2o.uploadFile('./X_train', parse_type = "SVMLight", destination_frame='train')

svmlight稀疏格式还将被h2o.importFile()检测到,h2o.importFile()是并行读取器,可从客户端指定的位置从服务器提取信息.

svmlight sparse format will also be detected by h2o.importFile(), which is a parallelized reader and pulls information from the server from a location specified by the client.

train.hex <- h2o.importFile('./X_train', destination_frame='train')

这篇关于如何在R中的特征哈希矩阵上使用H2o的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆