如何使用spark/scala解析YAML [英] How to parse a YAML with spark/scala

查看:60
本文介绍了如何使用spark/scala解析YAML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有带有以下详细信息的yaml文件.文件名:config.yml

I have yaml file with following details. file name : config.yml

- firstName: "James"
  lastName: "Bond"
  age: 30

- firstName: "Super"
  lastName: "Man"
  age: 25

由此,我需要使用带有scala的spark获得一个spark数据框

From this I need to get a spark dataframe using spark with scala

+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James    |Bond    |
|25 |Super    |Man     |
+---+---------+--------+

我曾尝试先转换为json,然后转换为数据框,但无法在数据集序列中指定它.

I have tried converting to json and then to dataframe, but I am not able to specify it in a dataset sequence.

推荐答案

有一个解决方案,它将帮助您将Yaml转换为json,然后将其读取为DataFrame

There is a solution, that will help you convert your yaml to json and then read it as a DataFrame

您需要添加这2个依赖项:

You need to add this 2 dependencies:

import com.fasterxml.jackson.databind.ObjectMapper
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory

class ScalaYamltoDataFrame {

val yamlExample = "- firstName: \"James\"\n  lastName: \"Bond\"\n  age: 30\n\n- firstName: \"Super\"\n  lastName: \"Man\"\n  age: 25"

  def convertYamlToJson(yaml: String): String = {
    val yamlReader = new ObjectMapper(new YAMLFactory)
    val obj = yamlReader.readValue(yaml, classOf[Any])
    val jsonWriter = new ObjectMapper
    jsonWriter.writeValueAsString(obj)
  }

  println(convertYamlToJson(yamlExample))

  def yamlToDF(): Unit = {

    @transient
    lazy val sparkSession = SparkSession.builder
      .master("local")
      .appName("Convert Yaml to Dataframe")
      .getOrCreate()

    import sparkSession.implicits._

    val ds  = sparkSession.read
      .option("multiline", true)
      .json(Seq(convertYamlToJson(yamlExample)).toDS)


    ds.show(false)

    ds.printSchema()
  }

//println(convertYamlToJson(yamlExample))
[{"firstName":"James","lastName":"Bond","age":30},{"firstName":"Super","lastName":"Man","age":25}]

//ds.show(false)
+---+---------+--------+
|age|firstName|lastName|
+---+---------+--------+
|30 |James    |Bond    |
|25 |Super    |Man     |
+---+---------+--------+


//ds.printSchma()
root
 |-- age: long (nullable = true)
 |-- firstName: string (nullable = true)
 |-- lastName: string (nullable = true)

希望这会有所帮助!

这篇关于如何使用spark/scala解析YAML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆