我可以更改作为表加载到SQL DataWare House的Spark数据帧列的数据类型吗? [英] Can I change the datatype of the Spark dataframe columns that is being loaded to SQL DataWare House as a table?
问题描述
我正在尝试从Azure读取Parquet文件使用以下Pyspark代码的Data Lake.
df= sqlContext.read.format("parquet")
.option("header", "true")
.option("inferSchema", "true")
.load("adl://xyz/abc.parquet")
df = df['Id','IsDeleted']
现在我想将此数据帧df加载为表sql数据仓库使用以下代码
df.write \
.format("com.databricks.spark.sqldw") \
.mode('overwrite') \
.option("url", sqlDwUrlSmall) \
.option("forward_spark_azure_storage_credentials", "true") \
.option("dbtable", "test111") \
.option("tempdir", tempDir) \
.save()
This creates a table dbo.test111 in the SQL Datawarehouse with datatypes:
- IsDeleted(bit,null)
- Id(nvarchar(256),null)
- IsDeleted(bit,null)
但是我需要这些具有不同数据类型的列,例如SQL Datawarehouse中的char(255),varchar(128).将数据框加载到SQL Dataware house时该如何做?
But I need these columns with different datatypes say char(255), varchar(128) in SQL Datawarehouse. How do I do this while loading the dataframe into SQL Dataware house?
推荐答案
您可以使用带有DataType实例的强制转换方法在PySpark中实现相同的目的.强制转换列之后,您可以写入sql数据仓库中的表.
You can achieve the same in PySpark using cast method with DataType instance. After casting the column, you can write to the table in sql data warehouse.
有一个类似的主题,您可以在其中阅读有关铸造的信息:
There's a similar thread where you can read about casting :
https://stackoverflow.com/questions/32284620/how-to-change-a-dataframe-column-from-string-type-to-double-type-in-pyspark
https://stackoverflow.com/questions/32284620/how-to-change-a-dataframe-column-from-string-type-to-double-type-in-pyspark
让我们知道这是否有帮助.否则,我们可以继续进行进一步的探索.
Let us know if this helps. Else, we can gladly continue to probe in further.
这篇关于我可以更改作为表加载到SQL DataWare House的Spark数据帧列的数据类型吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!