使用 pyspark 从 csv 文件上传自定义架构 [英] Uploading custom schema from a csv file using pyspark

查看：31 发布时间：2021/11/14 23:22:21 python-3.x apache-spark pyspark apache-spark-sql schema

本文介绍了使用 pyspark 从 csv 文件上传自定义架构的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个关于使用 pyspark 将架构加载到 cdsw 的查询.我有一个使用 csv 文件创建的数据框

I have a query about loading the schema onto cdsw using pyspark. I have a dataframe which is created using a csv file

data_1 = spark.read.csv("demo.csv",sep = ",", header = True, inferSchema = True)

大多数变量的数据类型读取错误，即其中大约 60 个，我不能一直手动更改它们.我知道架构必须是什么样子.

The data types are read wrong for most of the variable i.e around 60 of them, I can't change them manually all the time. I know what the schema must look like.

有什么办法，我也可以从 csv 文件加载架构?就像它可以读取数据集并覆盖我正在上传的架构一样.

Is there any way, I could load the schema as well from a csv file? Like it could read the dataset and override the schema which I am uploading.

推荐答案

使用自定义架构读取，以便您可以定义所需的确切数据类型.

Read with custom schema so that u can define what exact datatype you wanted.

        schema = StructType([ \
            StructField("COl1",StringType(),True), \
            StructField("COL2",DecimalType(20,10),True), \
            StructField("COL3",DecimalType(20,10),True)
        ])

        df = spark.read.schema(schema).csv(file_path)

这篇关于使用 pyspark 从 csv 文件上传自定义架构的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 pyspark 从 csv 文件上传自定义架构 [英] Uploading custom schema from a csv file using pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 pyspark 从 csv 文件上传自定义架构 [英] Uploading custom schema from a csv file using pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭