将HiveServer2指向MiniMRCluster进行Hive测试 [英] Pointing HiveServer2 to MiniMRCluster for Hive Testing

查看:242
本文介绍了将HiveServer2指向MiniMRCluster进行Hive测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直希望对我一直在开发的一些代码进行Hive集成测试。我需要测试框架的两个主要要求:


  1. 它需要使用Cloudera版本的Hive和Hadoop
    (最好是2.0.0-cdh4.7.0)
  2. 它必须是 全部本地 。也就是说,Hadoop集群和Hive
    服务器应该在测试开始时启动,运行一些查询,
    并在测试结束后拆除。

所以我把这个问题分解为三部分:
$ b $ ol

  • 获取HiveServer2部分的代码我决定在Thrift服务客户端上使用JDBC
    连接器)
  • 获取构建内存中MapReduce集群的代码(我决定使用MiniMRCluster进行此操作)
  • >
  • 设置上面的(1)和(2)以便相互协作。

  • 通过查看许多资源,我能够避开(1)。其中一些非常有用的是:



    对于(2),我在StackOverflow中遵循了这个优秀的帖子: b
    $ b



    到目前为止, 太好了。在这个时候,我的Maven项目中的pom.xml包含了上面的两个功能,看起来像这样:

      <库> 
    < repository>
    < id> cloudera< / id>
    < url> https://repository.cloudera.com/artifactory/cloudera-repos/< / url>
    < / repository>
    < / repositories>

    <依赖关系>
    < dependency>
    < groupId> commons-io< / groupId>
    < artifactId> commons-io< / artifactId>
    < version> 2.1< / version>
    < /依赖关系>
    < dependency>
    < groupId> junit< / groupId>
    < artifactId> junit< / artifactId>
    < version> 4.11< / version>
    < /依赖关系>
    <! - START:使MiniMRCluster正常工作的依赖关系 - >
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-auth< / artifactId>
    < version> 2.0.0-cdh4.7.0< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-test< / artifactId>
    < version> 2.0.0-mr1-cdh4.7.0< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-hdfs< / artifactId>
    < version> 2.0.0-cdh4.7.0< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-hdfs< / artifactId>
    < version> 2.0.0-cdh4.7.0< / version>
    < classifier>测试< / classifier>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-common< / artifactId>
    < version> 2.0.0-cdh4.7.0< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-common< / artifactId>
    < version> 2.0.0-cdh4.7.0< / version>
    < classifier>测试< / classifier>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-core< / artifactId>
    < version> 2.0.0-mr1-cdh4.7.0< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hadoop< / groupId>
    < artifactId> hadoop-core< / artifactId>
    < version> 2.0.0-mr1-cdh4.7.0< / version>
    < classifier>测试< / classifier>
    < /依赖关系>
    <! - END:使得MiniMRCluster正常工作的相关性 - >

    <! - START:获得Hive JDBC工作的依赖 - >
    < dependency>
    < groupId> org.apache.hive< / groupId>
    < artifactId> hive-builtins< / artifactId>
    < version> $ {hive.version}< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hive< / groupId>
    < artifactId> hive-cli< / artifactId>
    < version> $ {hive.version}< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hive< / groupId>
    < artifactId>配置单元 - metastore< / artifactId>
    < version> $ {hive.version}< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hive< / groupId>
    < artifactId> hive-serde< / artifactId>
    < version> $ {hive.version}< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hive< / groupId>
    < artifactId> hive-common< / artifactId>
    < version> $ {hive.version}< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hive< / groupId>
    < artifactId> hive-exec< / artifactId>
    < version> $ {hive.version}< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.hive< / groupId>
    < artifactId> hive-jdbc< / artifactId>
    < version> $ {hive.version}< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.thrift< / groupId>
    < artifactId> libfb303< / artifactId>
    < version> 0.9.1< / version>
    < /依赖关系>
    < dependency>
    < groupId> log4j< / groupId>
    < artifactId> log4j< / artifactId>
    < version> 1.2.15< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.antlr< / groupId>
    < artifactId> antlr-runtime< / artifactId>
    < version> 3.5.1< / version>
    < /依赖关系>
    < dependency>
    < groupId> org.apache.derby< / groupId>
    < artifactId> derby< / artifactId>
    < version> 10.10.1.1< / version>
    < /依赖关系>
    < dependency>
    < groupId> javax.jdo< / groupId>
    < artifactId> jdo2-api< / artifactId>
    < version> 2.3-ec< / version>
    < /依赖关系>
    < dependency>
    < groupId> jpox< / groupId>
    < artifactId> jpox< / artifactId>
    < version> 1.1.9-1< / version>
    < /依赖关系>
    < dependency>
    < groupId> jpox< / groupId>
    < artifactId> jpox-rdbms< / artifactId>
    < version> 1.2.0-beta-5< / version>
    < /依赖关系>
    <! - END:用于使Hive JDBC工作的依赖关系 - >
    < /依赖关系>

    现在我正在执行步骤(3)。我尝试运行以下代码:

      @Test $ b $ public void testHiveMiniDFSClusterIntegration()throws IOException,SQLException {
    配置conf =新配置();
    $ b $ / *构建MiniDFSCluster * /
    MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();
    $ b $ / *建立MiniMR群集* /
    System.setProperty(hadoop.log.dir,/ Users / nishantkelkar / IdeaProjects /+
    nkelkar-incubator /蜂房测试/目标/蜂房/日志);
    int numTaskTrackers = 1;
    int numTaskTrackerDirectories = 1;
    String [] racks = null;
    String [] hosts = null;
    MiniMRCluster miniMR = new MiniMRCluster(numTaskTrackers,miniDFS.getFileSystem().getUri()。toString(),
    numTaskTrackerDirectories,机架,主机,新的JobConf(conf));

    System.setProperty(mapred.job.tracker,miniMR.createJobConf(
    new JobConf(conf))。get(mapred.job.tracker));

    尝试{
    字符串driverName =org.apache.hive.jdbc.HiveDriver;
    Class.forName(driverName);
    } catch(ClassNotFoundException e){
    e.printStackTrace();
    System.exit(1);
    }

    连接hiveConnection = DriverManager.getConnection(
    jdbc:hive2:///,,);
    Statement stm = hiveConnection.createStatement();

    //现在创建测试表并查询它们
    stm.execute(set hive.support.concurrency = false);
    stm.execute(drop table if exists test);
    stm.execute(如果不存在则创建表test(一个int,b int)行格式以''结尾的分隔字段);
    stm.execute(create table dual as select 1 as one from test);
    stm.execute(insert into table test select stack(1,4,5)AS(a,b)from dual);
    stm.execute(select * from test);
    }

    我希望(3)可以通过下面的代码行来解决从上面的方法:

     连接hiveConnection = DriverManager.getConnection(
    jdbc:hive2:///, ,);

    但是,我收到以下错误:

      java.sql.SQLException:处理语句时出错:FAILED:执行错误,从org.apache.hadoop.hive.ql.exec.DDLTask返回代码1 
    在org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:161)
    在org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:150)
    在org.apache .hive.jdbc.HiveStatement.execute(HiveStatement.java:207)
    at com.ask.nkelkar.hive.HiveUnitTest.testHiveMiniDFSClusterIntegration(HiveUnitTest.java:54)

    任何人都可以让我知道我还需要做些什么以及我做错了什么来完成这项工作?



    PS我查看了 HiveRunner hive_test 项目作为选项,但我无法使它们与Cloudera版本的Hadoop配合使用。

    / div>

    您的测试在第一个 create table 语句中失败。 Hive无意中抑制了以下错误消息:

      file:/ user / hive / warehouse / test不是目录或无法创建一个

    Hive试图使用默认仓库目录 / user / hive / warehouse ,它不存在于你的文件系统上。你可以创建目录,但是为了测试你可能想要覆盖默认值。例如:

      import static org.apache.hadoop.hive.conf.HiveConf.ConfVars; 
    ...
    System.setProperty(ConfVars.METASTOREWAREHOUSE.toString(),/ Users / nishantkelkar / IdeaProjects /+
    nkelkar-incubator / hive-test / target / hive /仓库);


    I've been wanting to do Hive integration testing for some of the code that I've been developing. The two major requirements of the testing framework that I need:

    1. It needs to work with a Cloudera version of Hive and Hadoop (preferably, 2.0.0-cdh4.7.0)
    2. It needs to be all local. Meaning, the Hadoop cluster and Hive server should start on the beginning of the test, run a few queries, and teardown after the test is over.

    So I broke this problem down into three parts:

    1. Getting code for the HiveServer2 part (I decided to use a JDBC connector over a Thrift service client)
    2. Getting code for building an in-memory MapReduce cluster (I decided to use MiniMRCluster for this)
    3. Setting up both (1) and (2) above to work with each other.

    I was able to get (1) out of the way by looking at many resources. Some of these that were very useful are:

    For (2), I followed this excellent post in StackOverflow:

    So far, so good. At this point of time, my pom.xml in my Maven project, on including both above functionalities, looks something like this:

    <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>
    
    <dependencies>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.1</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
        </dependency>
        <!-- START: dependencies for getting MiniMRCluster to work -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-auth</artifactId>
            <version>2.0.0-cdh4.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-test</artifactId>
            <version>2.0.0-mr1-cdh4.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.0.0-cdh4.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.0.0-cdh4.7.0</version>
            <classifier>tests</classifier>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.0.0-cdh4.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.0.0-cdh4.7.0</version>
            <classifier>tests</classifier>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>2.0.0-mr1-cdh4.7.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>2.0.0-mr1-cdh4.7.0</version>
            <classifier>tests</classifier>
        </dependency>
        <!-- END: dependencies for getting MiniMRCluster to work -->
    
        <!-- START: dependencies for getting Hive JDBC to work -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-builtins</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-cli</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-metastore</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-serde</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-common</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>${hive.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.thrift</groupId>
            <artifactId>libfb303</artifactId>
            <version>0.9.1</version>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.15</version>
        </dependency>
        <dependency>
            <groupId>org.antlr</groupId>
            <artifactId>antlr-runtime</artifactId>
            <version>3.5.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.derby</groupId>
            <artifactId>derby</artifactId>
            <version>10.10.1.1</version>
        </dependency>
        <dependency>
            <groupId>javax.jdo</groupId>
            <artifactId>jdo2-api</artifactId>
            <version>2.3-ec</version>
        </dependency>
        <dependency>
            <groupId>jpox</groupId>
            <artifactId>jpox</artifactId>
            <version>1.1.9-1</version>
        </dependency>
        <dependency>
            <groupId>jpox</groupId>
            <artifactId>jpox-rdbms</artifactId>
            <version>1.2.0-beta-5</version>
        </dependency>
        <!-- END: dependencies for getting Hive JDBC to work -->
    </dependencies>
    

    Now I'm on step (3). I tried running the following code:

    @Test
        public void testHiveMiniDFSClusterIntegration() throws IOException, SQLException {
            Configuration conf = new Configuration();
    
            /* Build MiniDFSCluster */
            MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();
    
            /* Build MiniMR Cluster */
            System.setProperty("hadoop.log.dir", "/Users/nishantkelkar/IdeaProjects/" +
                    "nkelkar-incubator/hive-test/target/hive/logs");
            int numTaskTrackers = 1;
            int numTaskTrackerDirectories = 1;
            String[] racks = null;
            String[] hosts = null;
            MiniMRCluster miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),
                    numTaskTrackerDirectories, racks, hosts, new JobConf(conf));
    
            System.setProperty("mapred.job.tracker", miniMR.createJobConf(
                    new JobConf(conf)).get("mapred.job.tracker"));
    
            try {
                String driverName = "org.apache.hive.jdbc.HiveDriver";
                Class.forName(driverName);
            } catch (ClassNotFoundException e) {
                e.printStackTrace();
                System.exit(1);
            }
    
            Connection hiveConnection = DriverManager.getConnection(
                    "jdbc:hive2:///", "", "");
            Statement stm = hiveConnection.createStatement();
    
            // now create test tables and query them
            stm.execute("set hive.support.concurrency = false");
            stm.execute("drop table if exists test");
            stm.execute("create table if not exists test(a int, b int) row format delimited fields terminated by ' '");
            stm.execute("create table dual as select 1 as one from test");
            stm.execute("insert into table test select stack(1,4,5) AS (a,b) from dual");
            stm.execute("select * from test");
        } 
    

    My hope was that (3) would be solved by the following line of code from the above method:

        Connection hiveConnection = DriverManager.getConnection(
                "jdbc:hive2:///", "", "");
    

    However, I'm getting the following error:

    java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:161)
        at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:150)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:207)
        at com.ask.nkelkar.hive.HiveUnitTest.testHiveMiniDFSClusterIntegration(HiveUnitTest.java:54)
    

    Can anyone please let me know what I need to do in addition/what I'm doing wrong to get this to work?

    P.S. I looked at HiveRunner and hive_test projects as options, but I wasn't able to get these to work with Cloudera versions of Hadoop.

    解决方案

    Your test is failing at the first create table statement. Hive is unhelpfully suppressing the following error message:

    file:/user/hive/warehouse/test is not a directory or unable to create one
    

    Hive is attempting to use the default warehouse directory /user/hive/warehouse which doesn't exist on your filesystem. You could create the directory, but for testing you'll likely want to override the default value. For example:

    import static org.apache.hadoop.hive.conf.HiveConf.ConfVars;
    ...
    System.setProperty(ConfVars.METASTOREWAREHOUSE.toString(), "/Users/nishantkelkar/IdeaProjects/" +
                "nkelkar-incubator/hive-test/target/hive/warehouse");
    

    这篇关于将HiveServer2指向MiniMRCluster进行Hive测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆