org.apache.spark.ml.classification 和 org.apache.spark.mllib.classification 的区别 [英] Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification

查看:37
本文介绍了org.apache.spark.ml.classification 和 org.apache.spark.mllib.classification 的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个 Spark 应用程序,并想在 MLlib 中使用算法.在 API 文档中,我发现了相同算法的两个不同类.

I'm writing a spark application and would like to use algorithms in MLlib. In the API doc I found two different classes for the same algorithm.

例如,org.apache.spark.ml.classification 中有一个 LogisticRegression,org.apache.spark.mllib.classification 中有一个 LogisticRegressionwithSGD.

For example, there is one LogisticRegression in org.apache.spark.ml.classification also a LogisticRegressionwithSGD in org.apache.spark.mllib.classification.

我能找到的唯一区别是 org.apache.spark.ml 中的那个是从 Estimator 继承的,并且能够用于交叉验证.我很困惑它们被放置在不同的包装中.有没有人知道它的原因?谢谢!

The only difference I can find is that the one in org.apache.spark.ml is inherited from Estimator and was able to be used in cross validation. I was quite confused that they are placed in different packages. Is there anyone know the reason for it? Thanks!

推荐答案

It's JIRA ticket

来自设计文档:

MLlib 现在涵盖了机器学习算法的基本选择,例如逻辑回归、决策树、交替最小二乘法和 k 均值.当前的 API 集包含几个设计缺陷,阻止我们继续前进解决实用的机器学习管道,使 MLlib 本身成为一个可扩展的项目.

MLlib now covers a basic selection of machine learning algorithms, e.g., logistic regression, decision trees, alternating least squares, and k-means. The current set of APIs contains several design flaws that prevent us moving forward to address practical machine learning pipelines, make MLlib itself a scalable project.

新的 API 集将位于 org.apache.spark.ml 下,一旦我们将所有功能迁移到 oasml,oasmllib 将被弃用.

The new set of APIs will live under org.apache.spark.ml, and o.a.s.mllib will be deprecated once we migrate all features to o.a.s.ml.

这篇关于org.apache.spark.ml.classification 和 org.apache.spark.mllib.classification 的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆