如何将 mllib 矩阵转换为 spark 数据帧? [英] How to convert a mllib matrix to a spark dataframe?

查看:29
本文介绍了如何将 mllib 矩阵转换为 spark 数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 zeppelin 笔记本中漂亮地打印相关结果:

I want to pretty print the result of a correlation in a zeppelin notebook:

val Row(coeff: Matrix) = Correlation.corr(data, "features").head

实现此目的的方法之一是将结果转换为 DataFrame,每个值位于单独的列中并调用 z.show().

One of the ways to achieve this is to convert the result into a DataFrame with each value in a separate column and call z.show().

但是,查看 Matrix api 我看不出有什么办法可以做到这一点.

However, looking into the Matrix api I don't see any way to do this.

是否有另一种直接的方法来实现这一目标?

Is there another straight forward way to achieve this?

数据框有 50 列.仅转换为字符串无济于事,因为输出会被截断.

The dataframe has 50 columns. Just converting to a string would not help as the output get truncated.

推荐答案

使用 toString 方法应该是最简单和最快的方法.您可以通过输入要打印的最大行数以及最大行宽来更改输出.您可以通过在新行和,"上拆分来更改格式.例如:

Using the toString method should be the easiest and fastest way if you simply want to print the matrix. You can change the output by inputting the maximum number of lines to print as well as max line width. You can change the formatting by splitting on new lines and ",". For example:

val matrix = Matrices.dense(2,3, Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
matrix.toString
  .split("\n")
  .map(_.trim.split(" ").filter(_ != "").mkString("[", ",", "]"))
  .mkString("\n")

这将给出以下内容:

[1.0,3.0,5.0]
[2.0,4.0,6.0]

<小时>

但是,如果要将矩阵转换为 DataFrame,最简单的方法是先创建一个 RDD,然后使用 toDF().

val matrixRows = matrix.rowIter.toSeq.map(_.toArray)
val df = spark.sparkContext.parallelize(matrixRows).toDF("Row")

然后将每个值放在单独的列中,您可以执行以下操作

Then to put each value in a separate column you can do the following

val numOfCols = matrixRows.head.length
val df2 = (0 until numOfCols).foldLeft(df)((df, num) => 
    df.withColumn("Col" + num, $"Row".getItem(num)))
  .drop("Row")
df2.show(false)

使用示例数据的结果:

+----+----+----+
|Col0|Col1|Col2|
+----+----+----+
|1.0 |3.0 |5.0 |
|2.0 |4.0 |6.0 |
+----+----+----+

这篇关于如何将 mllib 矩阵转换为 spark 数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆