火花载体和Scala不变的矢量之间的区别? [英] Difference between spark Vectors and scala immutable Vector?

查看:194
本文介绍了火花载体和Scala不变的矢量之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写星火1.4 Scala中的一个项目,我目前在我的转换初始输入数据之间插入 spark.mllib.linalg.Vectors scala.immutable.Vector ,我以后要在我的算法一起工作。有人能简单介绍一下两者之间,并在不同的情况有什么人会比其他使用更多的有用吗?

I am writing a project for Spark 1.4 in Scala and am currently in between converting my initial input data into spark.mllib.linalg.Vectors and scala.immutable.Vector that I later want to work with in my algorithm. Could someone briefly explain the difference between the two and in what situation one would be more useful to use than the other?

感谢您。

推荐答案

spark.mllib.linalg.Vector 是专为线性代数应用。 mllib 提供了两种不同的实现 - DenseVector SparseVector 。当你有机会获得有用的方法,如规范 sqdist 是另有相当有限的。

spark.mllib.linalg.Vector is designed for linear algebra applications. mllib provides two different implementations - DenseVector, SparseVector. While you have access to useful methods like norm or sqdist it is rather limited otherwise.

由于从 org.apache.spark.mllib.linalg 所有数据结构可以存储只有64位浮点数( scala.Double )。

As all data structures from org.apache.spark.mllib.linalg it can store only 64-bit floating point numbers (scala.Double).

如果您计划使用 mllib 然后 spark.mllib.linalg.Vector 是pretty你多少唯一的选择。从 mllib 所有剩余的数据结构,本地和分布,是建立在 org.apache.spark.mllib.linalg.Vector

If you plan to use mllib then spark.mllib.linalg.Vector is pretty much your only option. All the remaining data structures from mllib, both local and distributed, are build on top of org.apache.spark.mllib.linalg.Vector.

另外, scala.immutable.Vector 可能是一个更好的选择。它是一个通用的,密集的数据结构

Otherwise, scala.immutable.Vector is probably a much better choice. It is a general purpose, dense data structure.

它可以存储任何类型的对象,所以你可以有矢量[字符串] 例如

It can store objects of any type, so you can have Vector[String] for example.

既然是 Traversable的您可以访问如地图所有预期的方法 flatMap 减少折叠过滤等。

Since it is Traversable you have access to all expected methods like map, flatMap, reduce, fold, filter, etc.

修改:如果你需要代数运算,并且不使用任何数据结构从 org.apache.spark.mllib.linalg.distributed 您可以preFER breeze.linalg.Vector spark.mllib.linalg.Vector 。它支持大集的代数方法,包括产品并提供典型的集合API。

Edit: If you need algebraic operations and don't use any of the data structures from org.apache.spark.mllib.linalg.distributed you may prefer breeze.linalg.Vector over spark.mllib.linalg.Vector. It supports larger set of the algebraic methods including dot product and provides typical collection API.

这篇关于火花载体和Scala不变的矢量之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆