使用hadoop项目的maven打包多项输入jar [英] Package a multiple-entry jar using maven for hadoop project

查看:250
本文介绍了使用hadoop项目的maven打包多项输入jar的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对maven很陌生。我想打包一个我的hadoop项目的jar和它的依赖项,然后使用它: .abc.def.SomeClass1 -params ...
hadoop jar project.jar com.abc.def.AnotherClass -params ...

我想为这个jar有多个入口点(不同的hadoop作业)。



我该怎么做? / p>

感谢!

解决方案




  1. Hadoop支持jar格式的jar文件 - 这意味着您的jar文件包含一个jar文件夹,它将被添加到classpath中作业提交和映射/减少任务执行

  2. 您可以解压缩jar依赖项,并将它们与您的类重新打包到一个整体jar中。 >

    第一个需要你创建一个maven程序集定义文件,但实际上比它的价值更麻烦。第二个也使用maven程序集,但是使用内置的描述符。要使用第二个,只需将以下内容添加到项目 - >构建 - > pom中的插件部分:

     < plugin> 
    < artifactId> maven-assembly-plugin< / artifactId>
    < version> 2.4< / version>
    <配置>
    < descriptorRefs>
    < descriptorRef> jar -with-dependencies< / descriptorRef>
    < / descriptorRefs>
    < / configuration>
    < / plugin>

    现在,当您运行mvn包时,您会在目标文件夹中获得两个罐子:


    1. $ {project.name} - $ {project.version} .jar 包含您项目的类和资源

    2. $ {project.name} - $ {project.version} -jar -with-dependencies.jar - 它将包含您的类/资源和从您的依赖关系树中的所有内容,并将编译范围解压缩并重新打包到一个jar中

    对于多入口点,你不需要做任何特定的事情,只要确保你没有在jar清单中定义一个 Main-Class 条目(如果你明确的配置一个清单,否则默认不会命名Main-Class,所以你应该很好)


    I'm new to maven. I want to package a jar of my hadoop project with its dependencies, and then use it like:

    hadoop jar project.jar com.abc.def.SomeClass1 -params ...
    hadoop jar project.jar com.abc.def.AnotherClass -params ...
    

    And I want to have multiple entry points for this jar (different hadoop jobs).

    How could I do it?

    Thanks!

    解决方案

    There's two ways to create a jar with dependencies:

    1. Hadoop supports jars in a jar format - meaning that your jar contain contain a lib folder of jars that will be added to the classpath at job submission and map / reduce task execution
    2. You can unpack the jar dependencies and re-pack them with your classes into a single monolithic jar.

    The first will require you to create a maven assembly definition file but in reality is more hassle than it's worth. The second also uses maven assemblies but utilizes a built in descriptor. To use the second, just add the following to your project -> build -> plugins section in the pom:

    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <version>2.4</version>
      <configuration>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
    </plugin>
    

    Now when you run mvn package you'll get two jars in your target folder:

    1. ${project.name}-${project.version}.jar - Which will just contain classes and resources for your project
    2. ${project.name}-${project.version}-jar-with-dependencies.jar - which will contain your classes / resources and everything from your dependency tree with a scope of compile unpacked and repacked into a single jar

    For multi entry points, you don't need to do anything specific, just make sure you don't define a Main-Class entry in the jar manifest (if you explicitly configure a manifest, otherwise the default doesn't name a Main-Class so you should be good)

    这篇关于使用hadoop项目的maven打包多项输入jar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆