前缀范围输出格式 [英] prefix span output formatting

查看:106
本文介绍了前缀范围输出格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试运行以下示例代码:

I am trying to run following example code:

import org.apache.spark.mllib.fpm.PrefixSpan

val sequences = sc.parallelize(Seq(
 Array(Array(1, 2), Array(3)),
 Array(Array(1), Array(3, 2), Array(1, 2)),
 Array(Array(1, 2), Array(5)),
 Array(Array(6))
), 2).cache()

val prefixSpan = new PrefixSpan()
 .setMinSupport(0.5)
 .setMaxPatternLength(5)

val model = prefixSpan.run(sequences)
model.freqSequences.collect().foreach { freqSequence =>
  println(
    freqSequence.sequence.map(_.mkString("[", ", ", "]")).mkString("[", ", ", "]") +
", " + freqSequence.freq
  )
}

我需要将model.freqSequences格式化为类似于以下内容(它是具有序列和freq的数据帧)

I need to format model.freqSequences to something similar to following(it is a dataframe with sequence and freq)

|[WrappedArray(2,3)] |  3
|[WrappedArray(1)]   |  2
|[WrappedArray(2,1)] |  1

推荐答案

freqSequence.sequence 上使用 flatten 并应用 toDF 应该可以期望的输出

Using flatten on freqSequence.sequence and applying toDF should give your desired output

model.freqSequences.map(freqSequence => (freqSequence.sequence.flatten, freqSequence.freq)).toDF("array", "freq").show(false)

应该给您

+------+----+
|array |freq|
+------+----+
|[2]   |3   |
|[3]   |2   |
|[1]   |3   |
|[2, 1]|3   |
|[1, 3]|2   |
+------+----+

我希望答案会有所帮助

这篇关于前缀范围输出格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆