我如何的火花数据帧转换为RDD并获得词袋 [英] How do i convert spark dataframe to RDD and get bag of words

查看：190 发布时间：2016/5/22 15:46:13 apache-spark apache-spark-sql apache-spark-ml

本文介绍了我如何的火花数据帧转换为RDD并获得词袋的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据帧称为文章

I have a dataframe called article

+--------------------+
|     processed_title|
+--------------------+
|[new, relictual, ...|
|[once, upon,a,time..|
+--------------------+

我要压平把它作为文字的包。
我怎么能做到这一点使用的现状。我曾尝试code以下，这似乎给我一个类型不匹配的问题。

I want to flatten it to get it as bag of words. How could I achieve this using the current situation. I have tried the code below which seems to give me a Type mismatch issue.

val bow_corpus = article.select("processed_title").rdd.flatMap(y => y)

我最终想要使用此bow_corpus来训练word2vec模式。

I eventually want to use this bow_corpus to train a word2vec model.

感谢

推荐答案

假设 processed_title 重新在SQL psented为 $ P $阵列＆LT;字符串＆GT; ：

Assuming that processed_title is represented in SQL as array<string>:

article.select("processed_title").rdd.flatMap(_.getSeq[String](0))

有也可以直接在数据帧进行训练 Word2Vec 变压器：

There is also Word2Vec transformer which can be trained directly on a DataFrame:

import org.apache.spark.ml.feature.Word2Vec

val word2Vec = new Word2Vec()
  .setInputCol("processed_title")
  .setOutputCol("vectors")
  .setMinCount(0)
  .fit(article)

word2Vec.findSynonyms("foo", 1)

另请参阅星火从行提取值

这篇关于我如何的火花数据帧转换为RDD并获得词袋的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何的火花数据帧转换为RDD并获得词袋 [英] How do i convert spark dataframe to RDD and get bag of words

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我如何的火花数据帧转换为RDD并获得词袋 [英] How do i convert spark dataframe to RDD and get bag of words

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭