在使用JSON的情况下，当模式推断留给Spark时，为什么Spark会输出nullable = true? [英] Why Spark outputs nullable = true, when schema inference left to Spark, in case of JSON?

查看：252 发布时间：2021/2/14 18:43:17 json dataframe apache-spark jsonschema

本文介绍了在使用JSON的情况下，当模式推断留给Spark时，为什么Spark会输出nullable = true?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当未指定架构并将其推论留给Spark时，为什么Spark会显示 nullable = true ?

Why does Spark show nullable = true, when schema is not specified and its inference is left to Spark ?

// shows nullable = true for fields which are present in all JSON records.
spark.read.json("s3://s3path").printSchema()

通过类

Going through the class JsonInferSchema, can see that for StructType, explicitly nullable is set to true. But am unable to understand the reason behind it.

PS:我的目的是为大型JSON数据集(小于100GB)推断模式，并希望了解Spark是否提供了该功能，或者是否必须编写自定义的map-reduce作业，如本文中突出显示的那样: 大规模JSON数据集的架构推断.一个主要的部分是我想知道哪些字段是可选的，哪些字段是必填的(没有数据集).

PS: My aim is to infer schema for a large JSON data set (< 100GB), and wanted to see if Spark provides the ability or would have to write a custom map-reduce job as highlighted in the paper: Schema Inference for Massive JSON Datasets. One major part is I want to know which fields are optional and which are mandatory (w.r.t the data set).

在使用JSON的情况下，当模式推断留给Spark时，为什么Spark会输出nullable = true? [英] Why Spark outputs nullable = true, when schema inference left to Spark, in case of JSON?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在使用JSON的情况下，当模式推断留给Spark时，为什么Spark会输出nullable = true? [英] Why Spark outputs nullable = true, when schema inference left to Spark, in case of JSON?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭