org.apache.spark.ml.classification 和 org.apache.spark.mllib.classification 的区别 [英] Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification
问题描述
我正在编写一个 Spark 应用程序,并想在 MLlib 中使用算法.在 API 文档中,我发现了相同算法的两个不同类.
I'm writing a spark application and would like to use algorithms in MLlib. In the API doc I found two different classes for the same algorithm.
例如,org.apache.spark.ml.classification 中有一个 LogisticRegression,org.apache.spark.mllib.classification 中有一个 LogisticRegressionwithSGD.
For example, there is one LogisticRegression in org.apache.spark.ml.classification also a LogisticRegressionwithSGD in org.apache.spark.mllib.classification.
我能找到的唯一区别是 org.apache.spark.ml 中的那个是从 Estimator 继承的,并且能够用于交叉验证.我很困惑它们被放置在不同的包装中.有没有人知道它的原因?谢谢!
The only difference I can find is that the one in org.apache.spark.ml is inherited from Estimator and was able to be used in cross validation. I was quite confused that they are placed in different packages. Is there anyone know the reason for it? Thanks!
推荐答案
It's JIRA ticket
来自设计文档:
MLlib 现在涵盖了机器学习算法的基本选择,例如逻辑回归、决策树、交替最小二乘法和 k 均值.当前的 API 集包含几个设计缺陷,阻止我们继续前进解决实用的机器学习管道,使 MLlib 本身成为一个可扩展的项目.
MLlib now covers a basic selection of machine learning algorithms, e.g., logistic regression, decision trees, alternating least squares, and k-means. The current set of APIs contains several design flaws that prevent us moving forward to address practical machine learning pipelines, make MLlib itself a scalable project.
新的 API 集将位于 org.apache.spark.ml
下,一旦我们将所有功能迁移到 oasml,
oasmllib
将被弃用代码>.
The new set of APIs will live under org.apache.spark.ml
, and o.a.s.mllib
will be deprecated once we migrate all features to o.a.s.ml
.
这篇关于org.apache.spark.ml.classification 和 org.apache.spark.mllib.classification 的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!