如何从Json schem文件创建DataFrame Schema [英] How to create DataFrame Schema from Json schem file
问题描述
我的用例是读取现有的json模式文件,解析此json模式文件,并从中构建一个Spark DataFrame模式。首先,我按照此处所述的步骤进行操作。
My use case is to read an existing json-schema file, parse this json-schema file and build a Spark DataFrame schema out of it. To start off I followed the steps mentioned here.
遵循的步骤
1.从Maven导入库
2.重新启动集群
3.创建示例JSON模式文件
4.使用此代码读取示例模式文件
val schema = SchemaConverter.convert( / FileStore /表/schemaFile.json)
当我在命令上方运行时,出现错误:找不到:值SchemaConverter
When I run above command I get error: not found: value SchemaConverter
为确保正在调用该库,我在重新启动群集后将笔记本重新连接到了群集。
To ensure that the library is being called I reattached the notebook to cluster after restarting the cluster.
除了尝试上述方法外,我还尝试了以下方法。我将JSONString替换为实际的JSON模式。
In addition to trying out the above method, I tried the below as well. I replaced jsonString with the actual JSON schema.
import org.apache.spark.sql.types。{DataType,StructType}
val newSchema = DataType.fromJson(jsonString) .asInstanceOf [StructType]
我一直在使用的示例架构有300多个字段,为简单起见,我使用了此处。
the sample Schema I've been playing with has 300+feilds, for simplicity, I used the sample schema from here.
推荐答案
SchemaConverter
对我有用。我使用了 spark-shell
来测试并安装了必需的软件包为 spark-shell --packages org.zalando:spark-json-schema_2.11: 0.6.1
。
SchemaConverter
works for me. I used spark-shell
to test and installed required package as spark-shell --packages "org.zalando:spark-json-schema_2.11:0.6.1"
.
scala> val schema = SchemaConverter.convertContent("""
| {
| "$schema": "http://json-schema.org/draft-04/schema#",
| "title": "Product",
| "description": "A product from Acme's catalog",
| "type": "object",
| "properties": {
| "id": {
| "description": "The unique identifier for a product",
| "type": "integer"
| },
| "name": {
| "description": "Name of the product",
| "type": "string"
| },
| "price": {
| "type": "number",
| "minimum": 0,
| "exclusiveMinimum": true
| }
| },
| "required": [
| "id",
| "name",
| "price"
| ]
| }
| """)
schema: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))
scala> schema.toString
res1: String = StructType(StructField(id,LongType,false), StructField(name,StringType,false), StructField(price,DoubleType,false))
要在读取json数据时显式指定架构吗?因为如果使用spark读取json数据,它将自动从json数据推断出架构。例如。
Do you want to explicitly specify schema while reading json data?, because if you read json data using spark, it automatically infers schema from json data. eg.
val df = spark.read.json("json-file")
df.printSchema() // Gives schema of json data
这篇关于如何从Json schem文件创建DataFrame Schema的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!