如何将数据集[(String,Seq [String])]转换为数据集[(String,String)]? [英] How to transform Dataset[(String, Seq[String])] to Dataset[(String, String)]?
问题描述
可能这是一个简单的问题,但我从火花开始了我的冒险.
Probably this's simple problem, but I begin my adventure with spark.
问题:我想在spark中获得以下结构(预期结果).现在,我具有以下结构.
Problem: I'd like to get following structure (Expected result) in spark. Now I have following structure.
title1,{word11,word12,word13 ...}
title2,{word12,word22,word23 ...}
title1, {word11, word12, word13 ...}
title2, {word12, word22, word23 ...}
数据存储在Dataset [(String,Seq [String])]
Data are stored in Dataset[(String, Seq[String])]
例外结果 我想获得元组[单词,标题]
Excepted result I would like to get Tuple [word, title]
word11,{title1}
word12,{title1}
word11, {title1}
word12, {title1}
我做什么
1.制作(title,seq [word1,word2,word,3])
What I do
1. Make (title, seq[word1,word2,word,3])
docs.mapPartitions { iter =>
iter.map {
case (title, contents) => {
val textToLemmas: Seq[String] = toText(....)
(title, textToLemmas)
}
}
}
- 我尝试使用.map将结构转换为元组,但无法做到.
- 我尝试遍历所有元素,但是后来我无法返回类型
感谢您的回答.
推荐答案
这应该有效:
val result = dataSet.flatMap { case (title, words) => words.map((_, title)) }
这篇关于如何将数据集[(String,Seq [String])]转换为数据集[(String,String)]?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!