XGBoost 和稀疏矩阵 [英] XGBoost and sparse matrix

查看:140
本文介绍了XGBoost 和稀疏矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 xgboost 来运行 -using python - 在分类问题上,我将数据放在 numpy 矩阵 X(行 = 观察值和列 = 特征)和标签中在 numpy 数组 y 中.因为我的数据很稀疏,所以我想让它使用稀疏版本的 X 来运行,但是当发生错误时,我似乎遗漏了一些东西.

I am trying to use xgboost to run -using python - on a classification problem, where I have the data in a numpy matrix X (rows = observations & columns = features) and the labels in a numpy array y. Because my data are sparse, I would like to make it run using a sparse version of X, but it seems I am missing something as an error occurs.

这是我所做的:

# Library import

import numpy as np
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from scipy.sparse import csr_matrix

# Converting to sparse data and running xgboost

X_csr = csr_matrix(X)
xgb1 = XGBClassifier()
xgtrain = xgb.DMatrix(X_csr, label = y )      #to work with the xgb format
xgtest = xgb.DMatrix(Xtest_csr)
xgb1.fit(xgtrain, y, eval_metric='auc')
dtrain_predictions = xgb1.predict(xgtest)   

等等...

现在尝试拟合分类器时出现错误:

Now I get an error when trying to fit the classifier :

File ".../xgboost/python-package/xgboost/sklearn.py", line 432, in fit
self._features_count = X.shape[1]

AttributeError: 'DMatrix' object has no attribute 'shape'

现在,我查看了它的来源,并相信它与我希望使用的稀疏格式有关.但它是什么,以及如何修复它,我不知道.

Now, I looked for a while on where it could come from, and believe it has to do with the sparse format I wish to use. But what it is, and how I could fix it, I have no clue.

我欢迎任何帮助或评论!非常感谢

I would welcome any help or comments ! Thank you very much

推荐答案

您正在使用 xgboost scikit-learn API (http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn),所以你不会需要将您的数据转换为 DMatrix 以适合 XGBClassifier().只需删除该行

You are using the xgboost scikit-learn API (http://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn), so you don't need to convert your data to a DMatrix to fit the XGBClassifier(). Just removing the line

xgtrain = xgb.DMatrix(X_csr, label = y )

应该可以:

type(X_csr) #scipy.sparse.csr.csr_matrix
type(y) #numpy.ndarray
xgb1 = xgb.XGBClassifier()
xgb1.fit(X_csr, y)

输出:

XGBClassifier(base_score=0.5, colsample_bylevel=1, colsample_bytree=1,
   gamma=0, learning_rate=0.1, max_delta_step=0, max_depth=3,
   min_child_weight=1, missing=None, n_estimators=100, nthread=-1,
   objective='binary:logistic', reg_alpha=0, reg_lambda=1,
   scale_pos_weight=1, seed=0, silent=True, subsample=1)

这篇关于XGBoost 和稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆