如何使用SparkSql和Hive为Maven创建pom.xml? [英] How to create pom.xml for maven using SparkSql and Hive?

查看:875
本文介绍了如何使用SparkSql和Hive为Maven创建pom.xml?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经创建了一个用于SparkSql和Hive连接的Maven项目,并编写了以下示例代码:

I have created a Maven Project for SparkSql and Hive connectivity and written the following example code:

SparkSession spark = SparkSession
            .builder()
            .appName("Java Spark Hive Example")
            .master("local[*]")
            .config("hive.metastore.uris", "thrift://localhost:9083")
            .enableHiveSupport()
            .getOrCreate();
try{
    spark.sql("select * from health").show();
} catch(Exception AnalysisException) {
    System.out.println("table not found");
}

我正在使用Spark 2.1.0和Hive 1.2.1

I am using Spark 2.1.0 and Hive 1.2.1

为运行上述代码,我从Spark文件夹导入Jar文件,并将其包含在项目中.对于这个特定的工作,我还没有使用过Maven Pom.xml.但是,当我迁移到更大的集群(如在AWS上)时,我需要运行我的JAR文件.

For running the above code, I import the Jar files from the Spark folder and included it in the project. I haven't used Maven Pom.xml for this particular job. But when I am moving to the bigger clusters like on AWS, I need to run my JAR file.

我无法运行,因为Maven无法找到依赖项.因此,我想到了添加依赖项.我尝试过:

I am not able to run as the Maven is not able to find the dependencies. So I thought of adding the dependencies. I tried this:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>2.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.2.1</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.10</artifactId>
    <version>1.2.1</version>
</dependency>

但是它不起作用,我看不到以前通过添加JAR文件获得的输出.
我想知道我是否做错了什么,如果是的话,请建议我该怎么做?即使按照文档中的Spark说明,如何在pom.xml中的项目中添加hive-site.xmlhdfs-site.xml?当前正在使用IntelliJ. 请让我知道我可以做什么来解决我的问题?

But it didn't work and I am not able to see the output what previously I was getting through adding JAR files.
I want to know whether I did anything wrong, if yes then please suggest me what to do? Even as per Spark instructions from the documentation, how I can add the hive-site.xml and hdfs-site.xml with my project in pom.xml? Currently using IntelliJ. Please let me know what I can do to resolve my issue?

推荐答案

我发现视错配错了.

在您的maven依赖项中,您的spark-sql& spark-hive的版本为1.2.1,但spark-core的版本为2.1.0

In your maven dependency your spark-sql & spark-hive are of version 1.2.1 but spark-core is of version 2.1.0

将所有依赖项更改为相同的版本号,并且应该可以使用

Change all the dependencies to same version number and that should work

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>2.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>2.1.0</version>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.10</artifactId>
    <version>2.1.0</version>
</dependency>

spark-core依赖性 http://mvnrepository .com/artifact/org.apache.spark/spark-core_2.10/2.1.0 spark-sql依赖 http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/2.1.0 spark-hive依赖 http://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10/2.1.0

spark-core dependency http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/2.1.0 spark-sql dependency http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10/2.1.0 spark-hive dependency http://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10/2.1.0

这篇关于如何使用SparkSql和Hive为Maven创建pom.xml?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆