如何使用Scala在Spark中使用DataSet? [英] How to work with DataSet in Spark using scala?

查看：461 发布时间：2020/9/4 20:56:46 scala apache-spark apache-spark-sql scheduler

本文介绍了如何使用Scala在Spark中使用DataSet?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用DataFrame加载CSV，然后转换为DataSet，但显示如下

I load my CSV using DataFrame then I converted to DataSet but it's shows like this

此行有多个标记:
-无法找到存储在数据集中的类型的编码器.导入
支持基本类型(Int，String等)和产品类型(案例类) spark.implicits._在将来的版本中将添加对序列化其他类型的支持.
-方法的参数不足，例如:(隐式证据$ 2:
org.apache.spark.sql.Encoder [DataSet.spark.aacsv])org.apache.spark.sql.Dataset [DataSet.spark.aacsv].未指定值参数证据$ 2

Multiple markers at this line:
- Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing
spark.implicits._ Support for serializing other types will be added in future releases.
- not enough arguments for method as: (implicit evidence$2:
org.apache.spark.sql.Encoder[DataSet.spark.aacsv])org.apache.spark.sql.Dataset[DataSet.spark.aacsv]. Unspecified value parameter evidence$2

如何解决此问题? 我的代码是-

How to resolve this?. My code is -

case class aaCSV(
    a: String, 
    b: String 
    )

object WorkShop {

  def main(args: Array[String]) = {
    val conf = new SparkConf()
      .setAppName("readCSV")
      .setMaster("local")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    val customSchema = StructType(Array(
        StructField("a", StringType, true),
        StructField("b", StringType, true)))

    val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").schema(customSchema).load("/xx/vv/ss.csv") 
    df.printSchema()
    df.show()
    val googleDS = df.as[aaCSV]
    googleDS.show()

  }

}

现在我这样更改了主要功能-

Now I changed main function like this -

def main(args: Array[String]) = {
    val conf = new SparkConf()
      .setAppName("readCSV")
      .setMaster("local")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
import sqlContext.implicits._;
   val sa = sqlContext.read.csv("/xx/vv/ss.csv").as[aaCSV]
    sa.printSchema()
    sa.show()
}

但是它引发错误-线程"main"中的异常org.apache.spark.sql.AnalysisException:给定输入列，无法解析"Adj_Close":[_c1，_c2，_c5，_c4，_c6，_c3 ，_c0];第1行pos 7 .我该怎么办?

But it throws error - Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'Adj_Close' given input columns: [_c1, _c2, _c5, _c4, _c6, _c3, _c0]; line 1 pos 7. What should i do ?

现在，我使用Spark Scheduler根据给定的时间间隔执行我的方法.但我引用了此链接- https://spark .apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application .请帮助我们.

Now I execute my method using based on given time interval using spark scheduler. But I refer this link - https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application. Kindly help us.

如何使用Scala在Spark中使用DataSet? [英] How to work with DataSet in Spark using scala?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Scala在Spark中使用DataSet? [英] How to work with DataSet in Spark using scala?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭