写入Spark Dataframe时会更改可空字段 [英] Nullable field is changed upon writing a Spark Dataframe

查看:209
本文介绍了写入Spark Dataframe时会更改可空字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码从镶木地板文件读取Spark DataFrame并将其写入另一个镶木地板文件.将DataFrame写入新的Parquet文件后,将更改ArrayType DataType中的可为空文件. 代码:

The following code reads a Spark DataFrame from parquet file and writes to another parquet file. Nullable filed in ArrayType DataType is changed after writing the DataFrame to a new Parquet file. Code:

    SparkConf sparkConf = new SparkConf();
    String master = "local[2]";
    sparkConf.setMaster(master);
    sparkConf.setAppName("Local Spark Test");
    JavaSparkContext sparkContext = new JavaSparkContext(new SparkContext(sparkConf));
    SQLContext sqc = new SQLContext(sparkContext);
    DataFrame dataFrame = sqc.read().parquet("src/test/resources/users.parquet");
    StructField[] fields = dataFrame.schema().fields();
    System.out.println(fields[2].dataType());
    dataFrame.write().mode(SaveMode.Overwrite).parquet("src/test/resources/users1.parquet");


    DataFrame dataFrame1 = sqc.read().parquet("src/test/resources/users1.parquet");
    StructField [] fields1 = dataFrame1.schema().fields();
    System.out.println(fields1[2].dataType());

输出: ArrayType(IntegerType,false) ArrayType(IntegerType,true)

Output: ArrayType(IntegerType,false) ArrayType(IntegerType,true)

火花版本为:1.6.2

Spark version is: 1.6.2

推荐答案

对于Spark 2.4或更低版本,所有从spark sql写入的列都可以为空.引用官方指南

For Spark 2.4 or before, all the columns written from spark sql are nullable. Quoting the official guide

Parquet是许多其他数据处理系统支持的列式格式. Spark SQL提供对读写Parquet文件的支持,该文件会自动保留原始数据的架构.编写Parquet文件时,出于兼容性原因,所有列都将自动转换为可空值.

Parquet is a columnar format that is supported by many other data processing systems. Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema of the original data. When writing Parquet files, all columns are automatically converted to be nullable for compatibility reasons.

这篇关于写入Spark Dataframe时会更改可空字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆