火花载体和Scala不变的矢量之间的区别? [英] Difference between spark Vectors and scala immutable Vector?
问题描述
我写星火1.4 Scala中的一个项目,我目前在我的转换初始输入数据之间插入 spark.mllib.linalg.Vectors
和 scala.immutable.Vector
,我以后要在我的算法一起工作。有人能简单介绍一下两者之间,并在不同的情况有什么人会比其他使用更多的有用吗?
I am writing a project for Spark 1.4 in Scala and am currently in between converting my initial input data into spark.mllib.linalg.Vectors
and scala.immutable.Vector
that I later want to work with in my algorithm. Could someone briefly explain the difference between the two and in what situation one would be more useful to use than the other?
感谢您。
推荐答案
spark.mllib.linalg.Vector
是专为线性代数应用。 mllib
提供了两种不同的实现 - DenseVector
, SparseVector
。当你有机会获得有用的方法,如规范
或 sqdist
是另有相当有限的。
spark.mllib.linalg.Vector
is designed for linear algebra applications. mllib
provides two different implementations - DenseVector
, SparseVector
. While you have access to useful methods like norm
or sqdist
it is rather limited otherwise.
由于从 org.apache.spark.mllib.linalg
所有数据结构可以存储只有64位浮点数( scala.Double
)。
As all data structures from org.apache.spark.mllib.linalg
it can store only 64-bit floating point numbers (scala.Double
).
如果您计划使用 mllib
然后 spark.mllib.linalg.Vector
是pretty你多少唯一的选择。从 mllib
所有剩余的数据结构,本地和分布,是建立在 org.apache.spark.mllib.linalg.Vector 顶部code>。
If you plan to use mllib
then spark.mllib.linalg.Vector
is pretty much your only option. All the remaining data structures from mllib
, both local and distributed, are build on top of org.apache.spark.mllib.linalg.Vector
.
另外, scala.immutable.Vector
可能是一个更好的选择。它是一个通用的,密集的数据结构
Otherwise, scala.immutable.Vector
is probably a much better choice. It is a general purpose, dense data structure.
它可以存储任何类型的对象,所以你可以有矢量[字符串]
例如
It can store objects of any type, so you can have Vector[String]
for example.
既然是 Traversable的
您可以访问如地图所有预期的方法
, flatMap
,减少
,折叠
,过滤
等。
Since it is Traversable
you have access to all expected methods like map
, flatMap
, reduce
, fold
, filter
, etc.
修改:如果你需要代数运算,并且不使用任何数据结构从 org.apache.spark.mllib.linalg.distributed
您可以preFER breeze.linalg.Vector
在 spark.mllib.linalg.Vector
。它支持大集的代数方法,包括点
产品并提供典型的集合API。
Edit: If you need algebraic operations and don't use any of the data structures from org.apache.spark.mllib.linalg.distributed
you may prefer breeze.linalg.Vector
over spark.mllib.linalg.Vector
. It supports larger set of the algebraic methods including dot
product and provides typical collection API.
这篇关于火花载体和Scala不变的矢量之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!