如何在IntelliJ IDEA中创建Spark / Scala项目(无法解析build.sbt中的依赖项)? [英] How to create Spark/Scala project in IntelliJ IDEA (fails to resolve dependencies in build.sbt)?
问题描述
我正在尝试在IntelliJ IDEA中构建和运行Scala / Spark项目。
I'm trying to build and run a Scala/Spark project in IntelliJ IDEA.
我添加了 org.apache.spark:全局库中的spark-sql_2.11:2.0.0
和我的 build.sbt
如下所示。
I have added org.apache.spark:spark-sql_2.11:2.0.0
in global libraries and my build.sbt
looks like below.
name := "test"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.0.0"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "2.0.0"
我仍然收到错误消息
未知工件。无法解析或编入索引
unknown artifact. unable to resolve or indexed
spark-sql
。
当试图建立项目时,错误是
When tried to build the project the error was
错误:(19,26)没有发现:输入sqlContext,val sqlContext = new sqlContext(sc)
Error:(19, 26) not found: type sqlContext, val sqlContext = new sqlContext(sc)
我不知道问题是什么。如何在IntelliJ IDEA中创建Spark / Scala项目?
I have no idea what the problem could be. How to create a Spark/Scala project in IntelliJ IDEA?
更新:
按照建议更新代码以使用 Spark Session
,但仍然无法读取csv文件。我在这做错了什么?谢谢!
Update:
Following the suggestions I updated the code to use Spark Session
, but it still unable to read a csv file. What am I doing wrong here? Thank you!
val spark = SparkSession
.builder()
.appName("Spark example")
.config("spark.some.config.option", "some value")
.getOrCreate()
import spark.implicits._
val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show() //it doesn't show anything
//pdf.select("DATE_KEY").show()
推荐答案
sql应该是大写字母如下
sql should upper case letters as below
val sqlContext = new SQLContext(sc)
SQLContext
不推荐使用较新版本的spark,因此建议您使用 SparkSession
SQLContext
is deprecated for newer versions of spark so I would suggest you to use SparkSession
val spark = SparkSession.builder().appName("testings").getOrCreate
val sqlContext = spark.sqlContext
如果你想设置 master
通过代码而不是 spark-submit
命令然后你可以设置 .master
(你也可以设置配置
)
If you want to set the master
through your code instead of from spark-submit
command then you can set .master
as well (you can set configs
too)
val spark = SparkSession.builder().appName("testings").master("local").config("configuration key", "configuration value").getOrCreate
val sqlContext = spark.sqlContext
更新
查看您的样本数据
DATE|PID|TYPE
8/03/2017|10199786|O
并测试您的代码
val testdf = spark.read.csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf.show()
我的输出为
+--------------------+
| _c0|
+--------------------+
| DATE|PID|TYPE|
|8/03/2017|10199786|O|
+--------------------+
现在为分隔符
和标题$ c $添加
.option
c> as
Now adding .option
for delimiter
and header
as
val testdf2 = spark.read.option("delimiter", "|").option("header", true).csv("/Users/H/Desktop/S_CR_IP_H.dat")
testdf2.show()
输出
+---------+--------+----+
| DATE| PID|TYPE|
+---------+--------+----+
|8/03/2017|10199786| O|
+---------+--------+----+
注意:我使用 .master(local)
获取 SparkSession
对象
这篇关于如何在IntelliJ IDEA中创建Spark / Scala项目(无法解析build.sbt中的依赖项)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!