在火花CSV自定义架构火花1.4.1抛出错误 [英] Custom schema in spark-csv throwing error in spark 1.4.1

查看：186 发布时间：2016/5/22 16:32:05 apache-spark spark-dataframe spark-csv

本文介绍了在火花CSV自定义架构火花1.4.1抛出错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图火花1.4.1使用火花-csv包火花壳处理CSV文件。

 斯卡拉＆GT;进口org.apache.spark.sql.hive.HiveContext
进口org.apache.spark.sql.hive.HiveContext斯卡拉＆GT;进口org.apache.spark.sql.hive.orc._
进口org.apache.spark.sql.hive.orc._斯卡拉＆GT;进口org.apache.spark.sql.types {StructType，StructField，StringType，IntegerType}。
进口org.apache.spark.sql.types {StructType，StructField，StringType，IntegerType}斯卡拉＆GT; VAL hiveContext =新org.apache.spark.sql.hive.HiveContext（SC）
15/12/21 2点06分24秒WARN SparkConf：配置键'spark.yarn.applicationMaster.waitTries'一直pcated作为火花1.3和德$ P $并可能在将来被移除。请用新钥匙spark.yarn.am.waitTime'代替。
15/12/21 2时06分24秒INFO HiveContext：初始化执行蜂巢版本0.13.1
hiveContext：org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@74cba4b斯卡拉＆GT; VAL customSchema = StructType（SEQ（StructField（年，IntegerType，真），StructField（使，StringType，真），StructField（模特，StringType，真），StructField（评论，StringType，真） ，StructField（空白，StringType，真实）））
customSchema：org.apache.spark.sql.types.StructType = StructType（StructField（年，IntegerType，真），StructField（做，StringType，真），StructField（型号，StringType，真），StructField（评论，StringType，真），StructField（空白，StringType，真））斯卡拉＆GT; VAL customSchema =（新StructType）。新增（年，IntegerType，真正的）。新增（使，StringType，真正的）。新增（样板，StringType，真正的）。新增（注释，StringType，真）。新增（空白，StringType，真）
：24：错误：没有足够的论据构造StructType：（字段：数组[org.apache.spark.sql.types.StructField]）org.apache.spark.sql.types.StructType。未指定的值参数字段。VAL customSchema =（新StructType）。新增（年，IntegerType，真正的）。新增（使，StringType，真正的）。新增（样板，StringType，真正的）。新增（注释，StringType，真）。新增（空白，StringType，真）

解决方案

据星火1.4.1文件没有为 StructType 一个无参数的构造函数，就是你所得到的错误。你需要或者升级，你在第一个例子建议1.5.x的获得无参数的构造函数或创建架构。

  VAL customSchema = StructType（SEQ（StructField（年，IntegerType，真），StructField（使，StringType，真），StructField（模特，StringType，真），StructField（评论，StringType，真），StructField（空白，StringType，真实）））

I trying to process CSV file using spark -csv package in spark-shell in spark 1.4.1.

scala> import org.apache.spark.sql.hive.HiveContext                                                                                                  
import org.apache.spark.sql.hive.HiveContext                                                                                                         

scala> import org.apache.spark.sql.hive.orc._                                                                                                        
import org.apache.spark.sql.hive.orc._                                                                                                               

scala> import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType};                                                         
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}                                                                 

scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)                                                                               
15/12/21 02:06:24 WARN SparkConf: The configuration key 'spark.yarn.applicationMaster.waitTries' has been deprecated as of Spark 1.3 and and may be removed in the future. Please use the new key 'spark.yarn.am.waitTime' instead.                                                                       
15/12/21 02:06:24 INFO HiveContext: Initializing execution hive, version 0.13.1                                                                      
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@74cba4b                                                   

scala> val customSchema = StructType(Seq(StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true)))
customSchema: org.apache.spark.sql.types.StructType = StructType(StructField(year,IntegerType,true), StructField(make,StringType,true), StructField(model,StringType,true), StructField(comment,StringType,true), StructField(blank,StringType,true))                                                     

scala> val customSchema = (new StructType).add("year", IntegerType, true).add("make", StringType, true).add("model", StringType, true).add("comment", StringType, true).add("blank", StringType, true)
:24: error: not enough arguments for constructor StructType: (fields: Array[org.apache.spark.sql.types.StructField])org.apache.spark.sql.types.StructType. Unspecified value parameter fields.                                                                                                                  

val customSchema = (new StructType).add("year", IntegerType, true).add("make", StringType, true).add("model", StringType,true).add("comment", StringType, true).add("blank", StringType, true)

解决方案

According to Spark 1.4.1 documentation there isn't a no-arg constructor for StructType, which is why you are getting the error. You need to either upgrade to 1.5.x to get the no-arg constructor or create the schema as you suggest in the first example.

val customSchema = StructType(Seq(StructField("year", IntegerType, true),StructField("make", StringType, true),StructField("model", StringType, true),StructField("comment", StringType, true),StructField("blank", StringType, true)))

这篇关于在火花CSV自定义架构火花1.4.1抛出错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在火花CSV自定义架构火花1.4.1抛出错误 [英] Custom schema in spark-csv throwing error in spark 1.4.1

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在火花CSV自定义架构火花1.4.1抛出错误 [英] Custom schema in spark-csv throwing error in spark 1.4.1

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭