Apache Spark Hive,具有Maven阴影的可执行JAR [英] Apache spark Hive, executable JAR with maven shade

查看:145
本文介绍了Apache Spark Hive,具有Maven阴影的可执行JAR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Apache Spark Hive构建apache-spark应用程序.到目前为止,一切都还好-我一直在Intellij IDEA中运行测试和整个应用程序,并使用maven一起进行所有测试.

I'm building apache-spark application with Apache Spark Hive. So far everything was ok - I've been running tests and whole application in Intellij IDEA and all tests together using maven.

现在,我想从bash运行整个应用程序,并使其与本地单节点群集一起运行.我正在使用maven-shade-plugin构建单个可执行JAR.

Now I want to run whole application from bash and let it run with local single-node cluster. I'm using maven-shade-plugin to build single executable JAR.

应用程序尝试从SparkContext创建新的HiveContext时崩溃.被抛出的异常告诉我,配置单元无法创建元存储,因为数据核及其插件系统存在某些问题.我试图遵循几个问题,如何在阴影下运行datanucleus插件系统,但运气不好.例如: Datanucleus,JDO和可执行jar-怎么做?

Application crashes when it tries to create new HiveContext out of SparkContext. Thrown exception tells me that hive can't create metastore because there is some problem with datanucleus and its plugin system. I tried to follow several questions how to run datanucleus plugin system with shade but out of luck. For example: Datanucleus, JDO and executable jar - how to do it?

使用hive编写应用程序的可执行JAR并从bash运行它的最佳方法是什么?也许数据核及其插件系统的某些设置?

What is the best way to compose executable JAR of application using hive and run it from bash? Perhaps some settings of datanucleus and its plugin system?

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>test</groupId>
    <artifactId>hive-test</artifactId>
    <version>1.0.0</version>

    <dependencies>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.7</version>
        </dependency>

        <!-- spark -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>1.6.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>1.6.0</version>
        </dependency>
    </dependencies>

    <properties>
        <!-- To be specified in child pom:  <main.class></main.class> -->
        <final.jar.name>${project.artifactId}-${project.version}</final.jar.name>
        <main.class>com.test.HiveTest</main.class>
    </properties>

    <build>
        <plugins>
            <!-- the Maven compiler plugin will compile Java source files -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>${java.version}</source>
                    <target>${java.version}</target>
                </configuration>
            </plugin>

            <!-- the Maven Scala plugin will compile Scala source files -->
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <id>scala-compile-first</id>
                        <phase>process-resources</phase>
                        <goals>
                            <goal>add-source</goal>
                            <goal>compile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-Xmax-classfile-name</arg>
                                <arg>110</arg>
                            </args>
                        </configuration>
                    </execution>
                    <execution>
                        <id>scala-test-compile</id>
                        <phase>process-test-resources</phase>
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-Xmax-classfile-name</arg>
                                <arg>110</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>${main.class}</mainClass>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.XmlAppendingTransformer">
                                    <resource>plugin.xml</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            </transformers>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

样本代码

object HiveTest {
  def main(args: Array[String]) {
    // Set up Spark
    val conf = new SparkConf(true)
      .setMaster("local")
      .setAppName("hive-test")

    println("Initializing spark context")
    val sc = new SparkContext(conf)

    println("Initializing hive context")
    val hc = new HiveContext(sc)
  }
}

引发异常

java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
        at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
        at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
        at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
        at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
        at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
        at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
        at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:97)
        at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
        at com.test.HiveTest$.main(HiveTest.scala:21)
        at com.test.HiveTest.main(HiveTest.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
        ... 12 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
        ... 18 more
Caused by: javax.jdo.JDOFatalInternalException: Unexpected exception caught.
NestedThrowables:
java.lang.reflect.InvocationTargetException
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1193)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:624)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        ... 23 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
        ... 42 more
Caused by: org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:283)
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:247)
        at org.datanucleus.NucleusContext.<init>(NucleusContext.java:225)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.<init>(JDOPersistenceManagerFactory.java:416)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:301)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
        ... 50 more

提前谢谢

推荐答案

我已经找到解决问题的方法.解决方案在对可执行jar中数据核的原始问题的回答中进行了描述: https://stackoverflow.com/a/27030103/6390361

I have found solution to my problem. Solution is described in answer to original question for datanucleus in executable jar: https://stackoverflow.com/a/27030103/6390361

  1. 编辑 MANIFEST.MF 以假装它是数据核OSGi软件包.这可以通过从 datanucleus-core 清单中添加 Bundle-SymbolicName Premain-Class 条目来完成.

  1. Edit MANIFEST.MF to pretend that it is datanucleus OSGi bundle. This can be done by adding Bundle-SymbolicName and Premain-Class entries from datanucleus-core manifest.

在类路径(资源文件夹)中创建文件plugin.xml,并使用 datanucleus-core 项目中的根标记.

Create file plugin.xml in your classpath (resource folder) and use root tag from datanucleus-core project.

在插件标签的开头放置 datanucleus-core datanucleus-rdbms 中的所有扩展点标签. RDBMS项目中的所有扩展点都必须以 store.rdbms 为前缀.这非常重要,因为数据核使用完全分类的ID(包括来自根插件标签的一部分).

Put all extension-point tags from datanucleus-core and datanucleus-rdbms at the beginning of the plugin tag. All extension-points from RDBMS projects has to be prefixed with store.rdbms. This is very important because datanucleus uses fully classified IDs including part from root plugin tag.

合并项目 datanucleus-core datanucleus-rdbms datanucleus-api-jdo ,并将它们放在所有扩展点的后面.请注意,更多项目中存在一些扩展名,因此您需要合并具有相同ID的扩展名内容.

Merge all extension tags from projects datanucleus-core, datanucleus-rdbms and datanucleus-api-jdo and put them behind all extension points. Be careful some extensions are present in more projects so you need to merge content of extensions with same IDs.

清单条目

Bundle-SymbolicName: org.datanucleus;singleton:=true
Premain-Class: org.datanucleus.enhancer.DataNucleusClassFileTransformer

plugin.xml

plugin.xml文件太大,无法粘贴到此处,但是您应该可以手动将其合并.以下代码包含具有固定ID的所有RDBMS扩展点.

File plugin.xml is too big to be pasted here but you should be able to merge it by hand. Following code contains all RDBMS extension points with fixed IDs.

<?xml version="1.0" encoding="UTF-8"?>
<?eclipse version="3.2"?>    
<plugin id="org.datanucleus" name="DataNucleus Core" provider-name="DataNucleus">

    <!-- Extension points from datanucleus-core -->
    <extension-point id="api_adapter" name="Api Adapter" schema="schema/apiadapter.exsd"/>
    ...

    <!-- extension points from datanucleus-rdbms - fixed IDs -->
    <extension-point id="store.rdbms.connectionprovider" name="Connection Provider" schema="schema/connectionprovider.exsd"/>
    <extension-point id="store.rdbms.connectionpool" name="ConnectionPool" schema="schema/connectionpool.exsd"/>
    <extension-point id="store.rdbms.sql_expression" name="SQL Expressions" schema="schema/sql_expression.exsd"/>
    <extension-point id="store.rdbms.sql_method" name="SQL Methods" schema="schema/sql_method.exsd"/>
    <extension-point id="store.rdbms.sql_operation" name="SQL Expressions" schema="schema/sql_operation.exsd"/>
    <extension-point id="store.rdbms.sql_tablenamer" name="SQL Table Namer" schema="schema/sql_tablenamer.exsd"/>
    <extension-point id="store.rdbms.rdbms_mapping" name="RDBMS Mapping" schema="schema/rdbms_mapping.exsd"/>

    <!-- Merged extensions from datanucleus-core, datanucleus-rdbms and datanucleus-api-jdo -->
    <extension point="org.datanucleus.persistence_properties">...</extension>
    ...
</plugin>

maven-shade-plugin

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <manifestEntries>
                            <Main-Class>${main.class}</Main-Class>
                            <Premain-Class>org.datanucleus.enhancer.DataNucleusClassFileTransformer</Premain-Class>
                            <Bundle-SymbolicName>org.datanucleus;singleton:=true</Bundle-SymbolicName>
                        </manifestEntries>
                    </transformer>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                        <resource>reference.conf</resource>
                    </transformer>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                </transformers>
                <filters>
                    <filter>
                        <artifact>*:*</artifact>
                        <excludes>
                            <exclude>META-INF/*.SF</exclude>
                            <exclude>META-INF/*.DSA</exclude>
                            <exclude>META-INF/*.RSA</exclude>
                        </excludes>
                    </filter>
                </filters>
            </configuration>
        </execution>
    </executions>
</plugin>

这篇关于Apache Spark Hive,具有Maven阴影的可执行JAR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆