如何使用 SparkSql 和 Hive 为 maven 创建 pom.xml? [英] How to create pom.xml for maven using SparkSql and Hive?
问题描述
我为 SparkSql 和 Hive 连接创建了一个 Maven 项目并编写了以下示例代码:
I have created a Maven Project for SparkSql and Hive connectivity and written the following example code:
SparkSession spark = SparkSession
.builder()
.appName("Java Spark Hive Example")
.master("local[*]")
.config("hive.metastore.uris", "thrift://localhost:9083")
.enableHiveSupport()
.getOrCreate();
try{
spark.sql("select * from health").show();
} catch(Exception AnalysisException) {
System.out.println("table not found");
}
我使用的是 Spark 2.1.0 和 Hive 1.2.1
I am using Spark 2.1.0 and Hive 1.2.1
为了运行上述代码,我从 Spark 文件夹中导入 Jar 文件并将其包含在项目中.我没有将 Maven Pom.xml 用于这个特定的工作.但是当我迁移到像 AWS 这样更大的集群时,我需要运行我的 JAR 文件.
For running the above code, I import the Jar files from the Spark folder and included it in the project. I haven't used Maven Pom.xml for this particular job. But when I am moving to the bigger clusters like on AWS, I need to run my JAR file.
我无法运行,因为 Maven 无法找到依赖项.所以我想到了添加依赖项.我试过这个:
I am not able to run as the Maven is not able to find the dependencies. So I thought of adding the dependencies. I tried this:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.2.1</version>
</dependency>
但它不起作用,我无法看到以前通过添加 JAR 文件获得的输出.
我想知道我是否做错了什么,如果是,那么请建议我该怎么做?即使按照文档中的 Spark 说明,我如何在 pom.xml 中为我的项目添加 hive-site.xml
和 hdfs-site.xml
?目前使用 IntelliJ.请让我知道我可以做些什么来解决我的问题?
But it didn't work and I am not able to see the output what previously I was getting through adding JAR files.
I want to know whether I did anything wrong, if yes then please suggest me what to do? Even as per Spark instructions from the documentation, how I can add the hive-site.xml
and hdfs-site.xml
with my project in pom.xml? Currently using IntelliJ.
Please let me know what I can do to resolve my issue?
推荐答案
我发现依赖项配置错误.
I see there is a mis-configuration of depencies.
在您的 maven 依赖项中,您的 spark-sql
&spark-hive
的版本是 1.2.1
但 spark-core
的版本是 2.1.0
In your maven dependency your spark-sql
& spark-hive
are of version 1.2.1
but spark-core
is of version 2.1.0
将所有依赖项更改为相同的版本号,应该可以工作
Change all the dependencies to same version number and that should work
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>2.1.0</version>
</dependency>
spark-core
依赖 http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/2.1.0spark-sql
依赖 http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/2.1.0spark-hive
依赖 http://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10/2.1.0
spark-core
dependency http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/2.1.0
spark-sql
dependency http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/2.1.0
spark-hive
dependency http://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10/2.1.0
这篇关于如何使用 SparkSql 和 Hive 为 maven 创建 pom.xml?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!