如何使用spark/scala解析YAML [英] How to parse a YAML with spark/scala
本文介绍了如何使用spark/scala解析YAML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有带有以下详细信息的yaml文件.文件名:config.yml
I have yaml file with following details. file name : config.yml
- firstName: "James"
lastName: "Bond"
age: 30
- firstName: "Super"
lastName: "Man"
age: 25
由此,我需要使用带有scala的spark获得一个spark数据框
From this I need to get a spark dataframe using spark with scala
+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James |Bond |
|25 |Super |Man |
+---+---------+--------+
我曾尝试先转换为json,然后转换为数据框,但无法在数据集序列中指定它.
I have tried converting to json and then to dataframe, but I am not able to specify it in a dataset sequence.
推荐答案
有一个解决方案,它将帮助您将Yaml转换为json,然后将其读取为DataFrame
There is a solution, that will help you convert your yaml to json and then read it as a DataFrame
您需要添加这2个依赖项:
You need to add this 2 dependencies:
import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory
class ScalaYamltoDataFrame {
val yamlExample = "- firstName: \"James\"\n lastName: \"Bond\"\n age: 30\n\n- firstName: \"Super\"\n lastName: \"Man\"\n age: 25"
def convertYamlToJson(yaml: String): String = {
val yamlReader = new ObjectMapper(new YAMLFactory)
val obj = yamlReader.readValue(yaml, classOf[Any])
val jsonWriter = new ObjectMapper
jsonWriter.writeValueAsString(obj)
}
println(convertYamlToJson(yamlExample))
def yamlToDF(): Unit = {
@transient
lazy val sparkSession = SparkSession.builder
.master("local")
.appName("Convert Yaml to Dataframe")
.getOrCreate()
import sparkSession.implicits._
val ds = sparkSession.read
.option("multiline", true)
.json(Seq(convertYamlToJson(yamlExample)).toDS)
ds.show(false)
ds.printSchema()
}
//println(convertYamlToJson(yamlExample))
[{"firstName":"James","lastName":"Bond","age":30},{"firstName":"Super","lastName":"Man","age":25}]
//ds.show(false)
+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James |Bond |
|25 |Super |Man |
+---+---------+--------+
//ds.printSchma()
root
|-- age: long (nullable = true)
|-- firstName: string (nullable = true)
|-- lastName: string (nullable = true)
希望这会有所帮助!
这篇关于如何使用spark/scala解析YAML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文