spark Vectors 和 scala immutable Vector 之间的区别? [英] Difference between spark Vectors and scala immutable Vector?
问题描述
我正在 Scala 中为 Spark 1.4 编写一个项目,目前正在将我的初始输入数据转换为 spark.mllib.linalg.Vectors
和 scala.immutable.Vector
我以后想在我的算法中使用它.有人可以简要解释一下两者之间的区别,以及在什么情况下使用一种比另一种更有用吗?
I am writing a project for Spark 1.4 in Scala and am currently in between converting my initial input data into spark.mllib.linalg.Vectors
and scala.immutable.Vector
that I later want to work with in my algorithm. Could someone briefly explain the difference between the two and in what situation one would be more useful to use than the other?
谢谢.
推荐答案
spark.mllib.linalg.Vector
专为线性代数应用程序而设计.mllib
提供了两种不同的实现 - DenseVector
、SparseVector
.虽然您可以访问诸如 norm
或 sqdist
之类的有用方法,但其他方法却相当有限.
spark.mllib.linalg.Vector
is designed for linear algebra applications. mllib
provides two different implementations - DenseVector
, SparseVector
. While you have access to useful methods like norm
or sqdist
it is rather limited otherwise.
作为来自 org.apache.spark.mllib.linalg
的所有数据结构,它只能存储 64 位浮点数 (scala.Double
).
As all data structures from org.apache.spark.mllib.linalg
it can store only 64-bit floating point numbers (scala.Double
).
如果您打算使用 mllib
,那么 spark.mllib.linalg.Vector
几乎是您唯一的选择.mllib
中的所有剩余数据结构,无论是本地的还是分布式的,都建立在 org.apache.spark.mllib.linalg.Vector
之上.
If you plan to use mllib
then spark.mllib.linalg.Vector
is pretty much your only option. All the remaining data structures from mllib
, both local and distributed, are build on top of org.apache.spark.mllib.linalg.Vector
.
否则,scala.immutable.Vector
可能是更好的选择.它是一种通用的、密集的数据结构.
Otherwise, scala.immutable.Vector
is probably a much better choice. It is a general purpose, dense data structure.
它可以存储任何类型的对象,因此您可以使用 Vector[String]
例如.
It can store objects of any type, so you can have Vector[String]
for example.
由于它是 Traversable
,您可以访问所有预期的方法,例如 map
、flatMap
、reduce
、<代码>折叠代码>、<代码>过滤器代码>等
Since it is Traversable
you have access to all expected methods like map
, flatMap
, reduce
, fold
, filter
, etc.
编辑:如果您需要代数运算并且不使用 org.apache.spark.mllib.linalg.distributed
中的任何数据结构,您可能更喜欢 breeze.linalg.Vector
在 spark.mllib.linalg.Vector
之上.它支持更大的代数方法集,包括dot
积,并提供典型的集合API.
Edit: If you need algebraic operations and don't use any of the data structures from org.apache.spark.mllib.linalg.distributed
you may prefer breeze.linalg.Vector
over spark.mllib.linalg.Vector
. It supports larger set of the algebraic methods including dot
product and provides typical collection API.
这篇关于spark Vectors 和 scala immutable Vector 之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!