XGBoost每个工人集成一个火花模型 [英] XGBoost Spark One Model Per Worker Integration
问题描述
使用spark版本2.4.3和xgboost 0.90
Using spark version 2.4.3 and xgboost 0.90
尝试执行时请继续收到此错误ValueError: bad input shape ()
...
Keep getting this error ValueError: bad input shape ()
when trying to execute ...
features = inputTrainingDF.select("features").collect()
lables = inputTrainingDF.select("label").collect()
X = np.asarray(map(lambda v: v[0].toArray(), features))
Y = np.asarray(map(lambda v: v[0], lables))
xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic')
model = xgbClassifier.fit(X, Y)
ValueError: bad input shape ()
和
def trainXGbModel(partitionKey, labelAndFeatures):
X = np.asarray(map(lambda v: v[1].toArray(), labelAndFeatures))
Y = np.asarray(map(lambda v: v[0], labelAndFeatures))
xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic' )
model = xgbClassifier.fit(X, Y)
return [partitionKey, model]
xgbModels = inputTrainingDF\
.select("education", "label", "features")\
.rdd\
.map(lambda row: [row[0], [row[1], row[2]]])\
.groupByKey()\
.map(lambda v: trainXGbModel(v[0], list(v[1])))
xgbModels.take(1)
ValueError: bad input shape ()
您可以在笔记本中看到它对发布者有效.我的猜测是它与X
和Y
np.asarray()
映射有关,因为逻辑只是试图将标签和特征映射到函数,但是形状为空.使用此代码即可正常工作
You can see in the notebook it is working for whoever posted it. My guess is it has something to do with the X
and Y
np.asarray()
mapping because the logic is just trying to map the label and features to the function but the shapes are empty. Got it working using this code
pandasDF = inputTrainingDF.toPandas()
series = pandasDF['features'].apply(lambda x : np.array(x.toArray())).as_matrix().reshape(-1,1)
features = np.apply_along_axis(lambda x : x[0], 1, series)
target = pandasDF['label'].values
xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic' )
model = xgbClassifier.fit(features, target)
但是要集成到原始函数调用&了解为什么原始笔记本无法正常工作.非常感谢对此进行排查的眼睛!
however want to integrate into the original function call & understand why the original notebook does not work. An extra set of eyes to troubleshoot this would be much appreciated!
推荐答案
您可能正在使用python3.问题是在python3 map
函数中返回的是迭代器对象,而不是集合.解决此问题所需要做的就是更改map
-> list(map(...))
:
You are probably using python3. The issue is that in python3 map
function returns an iterator object, rather than a collection. All you have to do to fix this example is to change map
-> list(map(...))
:
def trainXGbModel(partitionKey, labelAndFeatures):
X = np.asarray(list(map(lambda v: v[1].toArray(), labelAndFeatures)))
Y = np.asarray(list(map(lambda v: v[0], labelAndFeatures)))
或者您可以使用 np.fromiter 将可迭代对象转换为numpy数组.
Or you can use np.fromiter to convert iterable object to numpy array.
这篇关于XGBoost每个工人集成一个火花模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!