XGBoost Spark 每个工人集成一个模型 [英] XGBoost Spark One Model Per Worker Integration
问题描述
尝试使用此笔记本 和 您可以在笔记本中看到它正在为发布它的人工作.我的猜测是它与 不过想集成到原来的函数调用&了解为什么原来的笔记本不起作用.非常感谢您提供额外的眼睛来解决此问题! 您可能正在使用 python3.问题是在 python3 或者您可以使用 np.fromiter将可迭代对象转换为 numpy 数组. Trying to work through this notebook https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1526931011080774/3624187670661048/6320440561800420/latest.html. Using spark version 2.4.3 and xgboost 0.90 Keep getting this error and You can see in the notebook it is working for whoever posted it. My guess is it has something to do with the however want to integrate into the original function call & understand why the original notebook does not work. An extra set of eyes to troubleshoot this would be much appreciated! You are probably using python3. The issue is that in python3 Or you can use np.fromiter to convert iterable object to numpy array. 这篇关于XGBoost Spark 每个工人集成一个模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!features = inputTrainingDF.select("features").collect()labels = inputTrainingDF.select("label").collect()X = np.asarray(map(lambda v: v[0].toArray(), features))Y = np.asarray(map(lambda v: v[0], 标签))xgbClassifier = xgb.XGBClassifier(max_depth=3,seed=18238,objective='binary:logistic')模型 = xgbClassifier.fit(X, Y)ValueError: 错误的输入形状 ()
def trainXGbModel(partitionKey, labelAndFeatures):X = np.asarray(map(lambda v: v[1].toArray(), labelAndFeatures))Y = np.asarray(map(lambda v: v[0], labelAndFeatures))xgbClassifier = xgb.XGBClassifier(max_depth=3,seed=18238,objective='binary:logistic')模型 = xgbClassifier.fit(X, Y)返回 [partitionKey, 型号]xgbModels = inputTrainingDF\.select("教育", "标签", "功能")\.rdd\.map(lambda 行: [row[0], [row[1], row[2]]])\.groupByKey()\.map(lambda v: trainXGbModel(v[0], list(v[1])))xgbModels.take(1)ValueError: 错误的输入形状 ()
X
和 Y
np.asarray()
映射有关,因为逻辑只是试图映射标签和功能的功能,但形状是空的.使用此代码让它工作pandasDF = inputTrainingDF.toPandas()series = pandasDF['features'].apply(lambda x : np.array(x.toArray())).as_matrix().reshape(-1,1)特征 = np.apply_along_axis(lambda x : x[0], 1, series)target = pandasDF['label'].valuesxgbClassifier = xgb.XGBClassifier(max_depth=3,seed=18238,objective='binary:logistic')模型 = xgbClassifier.fit(特征,目标)
map
函数返回一个迭代器对象,而不是一个集合.修复此示例所需要做的就是更改 map
-> list(map(...))
:def trainXGbModel(partitionKey, labelAndFeatures):X = np.asarray(list(map(lambda v: v[1].toArray(), labelAndFeatures)))Y = np.asarray(list(map(lambda v: v[0], labelAndFeatures)))
ValueError: bad input shape ()
when trying to execute ...features = inputTrainingDF.select("features").collect()
lables = inputTrainingDF.select("label").collect()
X = np.asarray(map(lambda v: v[0].toArray(), features))
Y = np.asarray(map(lambda v: v[0], lables))
xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic')
model = xgbClassifier.fit(X, Y)
ValueError: bad input shape ()
def trainXGbModel(partitionKey, labelAndFeatures):
X = np.asarray(map(lambda v: v[1].toArray(), labelAndFeatures))
Y = np.asarray(map(lambda v: v[0], labelAndFeatures))
xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic' )
model = xgbClassifier.fit(X, Y)
return [partitionKey, model]
xgbModels = inputTrainingDF\
.select("education", "label", "features")\
.rdd\
.map(lambda row: [row[0], [row[1], row[2]]])\
.groupByKey()\
.map(lambda v: trainXGbModel(v[0], list(v[1])))
xgbModels.take(1)
ValueError: bad input shape ()
X
and Y
np.asarray()
mapping because the logic is just trying to map the label and features to the function but the shapes are empty. Got it working using this codepandasDF = inputTrainingDF.toPandas()
series = pandasDF['features'].apply(lambda x : np.array(x.toArray())).as_matrix().reshape(-1,1)
features = np.apply_along_axis(lambda x : x[0], 1, series)
target = pandasDF['label'].values
xgbClassifier = xgb.XGBClassifier(max_depth=3, seed=18238, objective='binary:logistic' )
model = xgbClassifier.fit(features, target)
map
function returns an iterator object, rather than a collection. All you have to do to fix this example is to change map
-> list(map(...))
:def trainXGbModel(partitionKey, labelAndFeatures):
X = np.asarray(list(map(lambda v: v[1].toArray(), labelAndFeatures)))
Y = np.asarray(list(map(lambda v: v[0], labelAndFeatures)))