spark Vectors 和 scala immutable Vector 之间的区别? [英] Difference between spark Vectors and scala immutable Vector?

查看:26
本文介绍了spark Vectors 和 scala immutable Vector 之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Scala 中为 Spark 1.4 编写一个项目,目前正在将我的初始输入数据转换为 spark.mllib.linalg.Vectorsscala.immutable.Vector 我以后想在我的算法中使用它.有人可以简要解释一下两者之间的区别,以及在什么情况下使用一种比另一种更有用吗?

I am writing a project for Spark 1.4 in Scala and am currently in between converting my initial input data into spark.mllib.linalg.Vectors and scala.immutable.Vector that I later want to work with in my algorithm. Could someone briefly explain the difference between the two and in what situation one would be more useful to use than the other?

谢谢.

推荐答案

spark.mllib.linalg.Vector 专为线性代数应用程序而设计.mllib 提供了两种不同的实现 - DenseVectorSparseVector.虽然您可以访问诸如 normsqdist 之类的有用方法,但其他方法却相当有限.

spark.mllib.linalg.Vector is designed for linear algebra applications. mllib provides two different implementations - DenseVector, SparseVector. While you have access to useful methods like norm or sqdist it is rather limited otherwise.

作为来自 org.apache.spark.mllib.linalg 的所有数据结构,它只能存储 64 位浮点数 (scala.Double).

As all data structures from org.apache.spark.mllib.linalg it can store only 64-bit floating point numbers (scala.Double).

如果您打算使用 mllib,那么 spark.mllib.linalg.Vector 几乎是您唯一的选择.mllib 中的所有剩余数据结构,无论是本地的还是分布式的,都建立在 org.apache.spark.mllib.linalg.Vector 之上.

If you plan to use mllib then spark.mllib.linalg.Vector is pretty much your only option. All the remaining data structures from mllib, both local and distributed, are build on top of org.apache.spark.mllib.linalg.Vector.

否则,scala.immutable.Vector 可能是更好的选择.它是一种通用的、密集的数据结构.

Otherwise, scala.immutable.Vector is probably a much better choice. It is a general purpose, dense data structure.

它可以存储任何类型的对象,因此您可以使用 Vector[String] 例如.

It can store objects of any type, so you can have Vector[String] for example.

由于它是 Traversable,您可以访问所有预期的方法,例如 mapflatMapreduce、<代码>折叠、<代码>过滤器

Since it is Traversable you have access to all expected methods like map, flatMap, reduce, fold, filter, etc.

编辑:如果您需要代数运算并且不使用 org.apache.spark.mllib.linalg.distributed 中的任何数据结构,您可能更喜欢 breeze.linalg.Vectorspark.mllib.linalg.Vector 之上.它支持更大的代数方法集,包括dot积,并提供典型的集合API.

Edit: If you need algebraic operations and don't use any of the data structures from org.apache.spark.mllib.linalg.distributed you may prefer breeze.linalg.Vector over spark.mllib.linalg.Vector. It supports larger set of the algebraic methods including dot product and provides typical collection API.

这篇关于spark Vectors 和 scala immutable Vector 之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆