如何在R中的特征哈希矩阵上使用H2o [英] How to use H2o on feature hashed matrix in R
问题描述
我正在研究一个中等的数据集(train_data).有更多的124个变量和50000个观察值.对于分类变量,我已经通过R中的hashed.model.matrix函数对其进行了特征哈希处理.
I am working on a moderate data set (train_data). There are more 124 variables and 50,00,000 observations. For categorical variables, I have used feature hashing on it through hashed.model.matrix function in R.
## feature hashing
b <- 2 ^ 22
f <- ~ .-1
X_train <- hashed.model.matrix(f, train_data, hash.size=b)
因此,结果,我得到了一个大的dgCmatrix(稀疏矩阵)作为输出(X_train).如何在此矩阵上使用H2o包装器并使用H2o中可用的不同算法? H2o包装器是否采用稀疏矩阵(dgCmatrix).此类用法的任何链接/示例都将有所帮助.谢谢您的期待.
So, as a result , I have got a large dgCmatrix (a sparse matrix) as output (X_train). How can I use, H2o wrapper on this matrix and use different algorithms available in H2o ? Does H2o wrapper take sparse matrix (dgCmatrix). Any link / example of such usage will be helpful. Thanks in anticipation.
期待在H2o环境中导入X_train来进行步骤类型的简化
# initialize connection to H2O server
h2o.init(nthreads = -1)
train.hex <- h2o.uploadFile('./X_train', destination_frame='train')
# list of features for training
feature.names <- names(train.hex)
# train random forest model, use ntrees = 500
drf <- h2o.randomForest(x=feature.names, y='outcome', training_frame,train.hex, ntrees =500)
推荐答案
您可以将稀疏矩阵保存为svmlight稀疏格式,然后使用
you could save your sparse matrix to svmlight sparse format, then use
train.hex <- h2o.uploadFile('./X_train', parse_type = "SVMLight", destination_frame='train')
svmlight稀疏格式还将被h2o.importFile()
检测到,h2o.importFile()
是并行读取器,可从客户端指定的位置从服务器提取信息.
svmlight sparse format will also be detected by h2o.importFile()
, which is a parallelized reader and pulls information from the server from a location specified by the client.
train.hex <- h2o.importFile('./X_train', destination_frame='train')
这篇关于如何在R中的特征哈希矩阵上使用H2o的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!