Python-带有GridSearchCV的LightGBM永远运行 [英] Python - LightGBM with GridSearchCV, is running forever

查看:682
本文介绍了Python-带有GridSearchCV的LightGBM永远运行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我正在做多个实验来比较Python XgBoost和LightGBM。似乎这个LightGBM是一种新算法,人们说它在速度和准确性上都比XGBoost更好。

Recently, I am doing multiple experiments to compare Python XgBoost and LightGBM. It seems that this LightGBM is a new algorithm that people say it works better than XGBoost in both speed and accuracy.

这是 LightGBM GitHub
这是 LightGBM python API文档,在这里您将查找可以调用的python函数。可以从LightGBM模型直接调用它,也可以由LightGBM scikit-learn调用。

This is LightGBM GitHub. This is LightGBM python API documents, here you will find python functions you can call. It can be directly called from LightGBM model and also can be called by LightGBM scikit-learn.

这是 XGBoost Python API 。如您所见,它的数据结构与上面的LightGBM python API非常相似。

This is the XGBoost Python API I use. As you can see, it has very similar data structure as LightGBM python API above.

这是我尝试过的内容:


  1. 如果在XGBoost和LightGBM中都使用 train()方法,是的lightGBM可以更快地工作并且具有更高的准确性。但是此方法没有交叉验证。

  2. 如果您在两种算法中都尝试使用 cv()方法,则该方法适用于交叉验证验证。但是,我找不到使用它的方法来返回一组最佳参数的方法。

  3. 如果您尝试scikit-learn GridSearchCV()使用LGBMClassifier和XGBClassifer。它适用于XGBClassifer,但适用于LGBClassifier,它将永远运行。

  1. If you use train() method in both XGBoost and LightGBM, yes lightGBM works faster and has higher accuracy. But this method, doesn't have cross validation.
  2. If you try cv() method in both algorithms, it is for cross validation. However, I didn't find a way to use it return a set of optimum parameters.
  3. if you try scikit-learn GridSearchCV() with LGBMClassifier and XGBClassifer. It works for XGBClassifer, but for LGBClassifier, it is running forever.

这是我使用时的代码示例带有两个分类器的GridSearchCV()

带有GridSearchCV的XGBClassifier

param_set = {
 'n_estimators':[50, 100, 500, 1000]
}
gsearch = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, 
n_estimators=100, max_depth=5,
min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, 
nthread=7,
objective= 'binary:logistic', scale_pos_weight=1, seed=410), 
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)

xgb_model2 = gsearch.fit(features_train, label_train)
xgb_model2.grid_scores_, xgb_model2.best_params_, xgb_model2.best_score_

这对于XGBoost效果很好,仅需几秒钟即可。

This works very well for XGBoost, and only tool a few seconds.

带有GridSearchCV的LightGBM

param_set = {
 'n_estimators':[20, 50]
}

gsearch = GridSearchCV(estimator = LGBMClassifier( boosting_type='gbdt', num_leaves=30, max_depth=5, learning_rate=0.1, n_estimators=50, max_bin=225, 
 subsample_for_bin=0.8, objective=None, min_split_gain=0, 
 min_child_weight=5, 
 min_child_samples=10, subsample=1, subsample_freq=1, 
colsample_bytree=1, 
reg_alpha=1, reg_lambda=0, seed=410, nthread=7, silent=True), 
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)

lgb_model2 = gsearch.fit(features_train, label_train)
lgb_model2.grid_scores_, lgb_model2.best_params_, lgb_model2.best_score_

但是,通过对LightGBM使用此方法

However, by using this method for LightGBM, it has been running the whole morning today still nothing generated.

我使用的是同一数据集,一个数据集包含30000条记录。

I am using the same dataset, a dataset contains 30000 records.

我有2个问题:


  1. 如果仅使用 cv()方法,是否有必要调整最佳参数集?

  2. 您知道为什么吗 GridSearchCV()与LightGBM不能很好地配合吗?我想知道这是否仅发生在我身上,发生在别人身上的所有事情?

  1. If we just use cv() method, is there anyway to tune optimum set of parameters?
  2. Do you know why GridSearchCV() does not work well with LightGBM? I'm wondering whether this only happens on me all it happened on others to?


推荐答案

尝试使用 n_jobs = 1 看看是否可行。

Try to use n_jobs = 1 and see if it works.

通常,如果使用 n_jobs = -1 n_jobs> 1 ,那么如果__name __ =='__ main __'::

In general, if you use n_jobs = -1 or n_jobs > 1 then you should protect your script by using if __name__=='__main__': :

简单示例:

import ...

if __name__=='__main__':

    data= pd.read_csv('Prior Decompo2.csv', header=None)
    X, y = data.iloc[0:, 0:26].values, data.iloc[0:,26].values
    param_grid = {'C' : [0.01, 0.1, 1, 10], 'kernel': ('rbf', 'linear')}
    classifier = SVC()
    grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, scoring='accuracy', n_jobs=-1, verbose=42)
    grid_search.fit(X,y)

最后,您能否尝试使用 n_jobs =-运行代码1 并包括如果__name __ =='__ main __':正如我所解释的,看是否可行?

Finally, can you try to run your code using n_jobs = -1 and including if __name__=='__main__': as I explained and see if it works?

这篇关于Python-带有GridSearchCV的LightGBM永远运行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆