如何使用 NNLS 进行非负多元线性回归? [英] How to use NNLS for non-negative multiple linear regression?
问题描述
我正在尝试用 Java 解决非负多元线性回归问题.我找到了一个求解器类 org.apache.spark.mllib.optimization.NNLS
用 Scala 编写.但是,我不知道如何使用它.
I am trying to solve Non-negative multiple linear regression problem in Java.
And I found a solver class org.apache.spark.mllib.optimization.NNLS
written in Scala.
However, I don't know how to use this.
让我困惑的是,下面这个方法的界面看起来很奇怪.我认为 A
是一个 MxN 矩阵,b
是一个 M 向量,参数 ata
和 atb
应该分别是 NxN 矩阵和 N 向量.然而,ata
的实际类型是double[]
.
What makes me confused is that the interface of the following method seems strange.
I thought that A
is a MxN matrix and b
is a M-vector, and the arguments ata
and atb
should be a NxN matrix and N-vector, respectively.
However, the actual type of ata
is double[]
.
public static double[] solve(double[] ata, double[] atb, NNLS.Workspace ws)
我搜索了示例代码,但找不到.谁能给我一个示例代码?该库是用 Scala 编写的,但如果可能的话,我想要 Java 代码.
I searched for an example code but I couldn't find. Can anyone give me a sample code? The library is written in Scala, but I want Java code if possible.
推荐答案
免责声明 我从未使用过 NNLS
并且不知道非负多元线性回归.
DISCLAIMER I've never used NNLS
and got no idea about non-negative multiple linear regression.
你看看 Spark 2.1.1 的 NNLS
做你想做的事,但不是要走的路,因为 最新的 Spark 2.2.1 标记为 private[spark].
You look at Spark 2.1.1's NNLS
that does what you want, but is not the way to go since the latest Spark 2.2.1 marked as private[spark].
private[spark] object NNLS {
更重要的是,从 Spark 2.0 开始,org.apache.spark.mllib
包(包括 org.apache.spark.mllib.optimization
那个 NNLS
属于)在 维护模式:
More importantly, as of Spark 2.0, org.apache.spark.mllib
package (incl. org.apache.spark.mllib.optimization
that NNLS
belongs to) is in maintenance mode:
基于 MLlib RDD 的 API 现在处于维护模式.
The MLlib RDD-based API is now in maintenance mode.
从 Spark 2.0 开始,spark.mllib 包中基于 RDD 的 API 已进入维护模式.Spark 的主要机器学习 API 现在是 spark.ml 包中基于 DataFrame 的 API.
As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package.
换句话说,您应该远离包,尤其是 NNLS
.
In other words, you should stay away from the package and NNLS
in particular.
那么有哪些替代方案?
你可以看看NNLS
的测试,即NNLSSuite 在这里您可以找到一些答案.
You could look at the tests of NNLS
, i.e. NNLSSuite where you could find some answers.
然而,ata的实际类型是double[].
However, the actual type of ata is double[].
这是一个矩阵,所以元素再次翻倍.事实上,ata
是直接传递给 BLAS 的 dgemv
(这里和此处),在 LAPACK 文档:
That's a matrix so elements are doubles again. As a matter of fact, ata
is passed directly to BLAS's dgemv
(here and here) that is described in the LAPACK docs:
DGEMV 执行矩阵向量运算之一
DGEMV performs one of the matrix-vector operations
y := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y,
其中 alpha 和 beta 是标量,x 和 y 是向量,A 是一个m × n 矩阵.
where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
这应该会给你足够的答案.
That should give you enough answers.
另一个问题是 Spark MLlib 中对于 NNLS
类计算的推荐方法是什么?
Another question would be what the recommended way in Spark MLlib for NNLS
-like computations is?
看起来像 Spark MLLib 的 ALS 算法 在幕后使用 NNLS
(这对于机器学习从业者来说可能并不奇怪).
It looks like Spark MLLib's ALS algorithm uses NNLS
under the covers (which may not be that surprising for machine learning practitioners).
当 ALS 配置为使用 nonnegative 参数开启,即true
(默认情况下是禁用的).
That part of the code is used when ALS is configured to train a model with nonnegative parameter turned on, i.e. true
(which is disabled by default).
nonnegative 是否应用非负约束的参数.
nonnegative Param for whether to apply nonnegativity constraints.
默认值:false
是否对最小二乘使用非负约束
whether to use nonnegative constraint for least squares
我建议您查看 Spark MLlib 的那部分内容,以更深入地了解 NNLS
用于解决非负线性回归问题的用途.
I would recommend reviewing that part of Spark MLlib to get deeper into the uses of NNLS
for solving non-negative linear regression problem.
这篇关于如何使用 NNLS 进行非负多元线性回归?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!