如何从Json schem文件创建DataFrame Schema [英] How to create DataFrame Schema from Json schem file

查看:166
本文介绍了如何从Json schem文件创建DataFrame Schema的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的用例是读取现有的json模式文件,解析此json模式文件,并从中构建一个Spark DataFrame模式。首先,我按照此处所述的步骤进行操作。

My use case is to read an existing json-schema file, parse this json-schema file and build a Spark DataFrame schema out of it. To start off I followed the steps mentioned here.

遵循的步骤

1.从Maven导入库

2.重新启动集群

3.创建示例JSON模式文件

4.使用此代码读取示例模式文件

val schema = SchemaConverter.convert( / FileStore /表/schemaFile.json)

当我在命令上方运行时,出现错误:找不到:值SchemaConverter

When I run above command I get error: not found: value SchemaConverter

为确保正在调用该库,我在重新启动群集后将笔记本重新连接到了群集。

To ensure that the library is being called I reattached the notebook to cluster after restarting the cluster.

除了尝试上述方法外,我还尝试了以下方法。我将JSONString替换为实际的JSON模式。

In addition to trying out the above method, I tried the below as well. I replaced jsonString with the actual JSON schema.

import org.apache.spark.sql.types。{DataType,StructType}
val newSchema = DataType.fromJson(jsonString) .asInstanceOf [StructType]

我一直在使用的示例架构有300多个字段,为简单起见,我使用了此处

the sample Schema I've been playing with has 300+feilds, for simplicity, I used the sample schema from here.

推荐答案

SchemaConverter 对我有用。我使用了 spark-shell 来测试并安装了必需的软件包为 spark-shell --packages org.zalando:spark-json-schema_2.11: 0.6.1

SchemaConverter works for me. I used spark-shell to test and installed required package as spark-shell --packages "org.zalando:spark-json-schema_2.11:0.6.1".

scala> val schema = SchemaConverter.convertContent("""
 | {
 |   "$schema": "http://json-schema.org/draft-04/schema#",
 |   "title": "Product",
 |   "description": "A product from Acme's catalog",
 |   "type": "object",
 |   "properties": {
 |     "id": {
 |       "description": "The unique identifier for a product",
 |       "type": "integer"
 |     },
 |     "name": {
 |       "description": "Name of the product",
 |       "type": "string"
 |     },
 |     "price": {
 |       "type": "number",
 |       "minimum": 0,
 |       "exclusiveMinimum": true
 |     }
 |   },
 |   "required": [
 |     "id",
 |     "name",
 |     "price"
 |   ]
 | }
 | """)

schema: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))

scala> schema.toString
res1: String = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))

要在读取json数据时显式指定架构吗?因为如果使用spark读取json数据,它将自动从json数据推断出架构。例如。

Do you want to explicitly specify schema while reading json data?, because if you read json data using spark, it automatically infers schema from json data. eg.

val df = spark.read.json("json-file")
df.printSchema() // Gives schema of json data

这篇关于如何从Json schem文件创建DataFrame Schema的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆