星火 - 嵌套的Json指定模式 [英] Spark - Specifying Schema for nested Json

查看:207
本文介绍了星火 - 嵌套的Json指定模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用火花1.2.0

我要救卡夫卡流数据,以拼花地板。
使用jsonRDD创建表时应用架构到JSON数据集。
如这里所描述的https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html

I want to save data from kafka stream to parquet. apply a schema to a JSON dataset when creating a table using jsonRDD. as described here https://databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html

中的数据是从卡夫卡,并且未来通过为的嵌套的JSON

The data is from Kafka and is coming through as a nested json.

下面是一个基本的例子从文本文件阅读如何伊夫特定的模式的非嵌套的JSON。

Here is a basic example reading from a textfile for how Ive specific the schema for a non nested json.

    //contents of json
    hdfs@2db12:~$ hadoop fs -cat User/names.json
    {"name":"Michael", "age":10}
    {"name":"Andy", "age":30}
    {"name":"Justin"}

    //create RDD from json
    scala> val names= sc.textFile("hdfs://10.0.11.8:8020/user/hdfs/User/names.json")
    scala> names.collect().foreach(println)
    {"name":"Michael", "age":10}
    {"name":"Andy", "age":30}
    {"name":"Justin"}

    // specify schema
    val schemaString = "name age gender"
    val schema =
    StructType(
    schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))

    val peopleSchemaRDD = sqlContext.jsonRDD(names, schema)

   scala> peopleSchemaRDD.printSchema()
   root
   |-- name: string (nullable = true)
   |-- age: string (nullable = true)
   |-- gender: string (nullable = true)

   scala> peopleSchemaRDD.registerTempTable("people")

   scala> sqlContext.sql("SELECT name,age,gender FROM   people").collect().foreach(println)
   [Michael,10,null]
   [Andy,30,null]
   [Justin,null,null]

是否可以指定一个嵌套的JSON的架构?
对于例如.A JSON这样
       {文件名:详细信息,属性:{名:迈克尔,时代:10}}

Is it possible to specify the schema for a nested json? for e.g .a json like this {"filename":"details","attributes":{"name":"Michael", "age":10}}

非常感谢

推荐答案

您可以使用 sqlContext.jsonFile()如果您至少有一个JSON与性别字段。

you can use sqlContext.jsonFile() if you have at least one json with gender field.

或详细定义模式
VAL模式= StructType(
       StructField(文件名,StringType,真)::
       StructField(
           属性,
           StructType(schemaString.split().MAP(字段名=> StructField(字段名,StringType,真实))))::无)

这篇关于星火 - 嵌套的Json指定模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆