使用数据类型map< string,bigint>将数据帧写入csv.在Spark中 [英] Write dataframe to csv with datatype map<string,bigint> in Spark
问题描述
我有一个文件file1snappy.parquet.它有一个复杂的数据结构,如映射,内部数组.经过处理后,我得到了最终结果.将结果写入CSV时,我得到了一些错误提示
I have a file which is file1snappy.parquet. It is having a complex data structure like a map, array inside that.After processing that I got final result.while writing that results to csv I am getting some error saying
"Exception in thread "main" java.lang.UnsupportedOperationException: CSV data source does not support map<string,bigint> data type."
我使用的代码:
val conf=new SparkConf().setAppName("student-example").setMaster("local")
val sc = new SparkContext(conf)
val sqlcontext = new org.apache.spark.sql.SQLContext(sc)
val datadf = sqlcontext.read.parquet("C:\\file1.snappy.parquet")
def sumaggr=udf((aggr: Map[String, collection.mutable.WrappedArray[Long]]) => if (aggr.keySet.contains("aggr")) aggr("aggr").sum else 0)
datadf.select(col("neid"),sumaggr(col("marks")).as("sum")).filter(col("sum") =!= 0).show(false)
datadf.write.format("com.databricks.spark.csv").option("header", "true").save("C:\\myfile.csv")
我尝试转换datadf.toString(),但仍然面临相同的问题.如何将结果写入CSV.
I tried converting datadf.toString() but still I am facing same issue. How can write that result to CSV.
推荐答案
Spark CSV
源仅支持原子类型.您不能存储任何非原子的列
Spark CSV
source supports only atomic types. You cannot store any columns that are non-atomic
我认为最好是为具有 map< string,bigint>
作为数据类型的列创建一个JSON,并将其保存在csv中,如下所示.
I think best is to create a JSON for the column that has map<string,bigint>
as a datatype and save it in csv as below.
import spark.implicits._
import org.apache.spark.sql.functions._
datadf.withColumn("column_name_with_map_type", to_json(struct($"column_name_with_map_type"))).write.csv("outputpath")
希望这会有所帮助!
这篇关于使用数据类型map< string,bigint>将数据帧写入csv.在Spark中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!