将HiveServer2指向MiniMRCluster进行Hive测试 [英] Pointing HiveServer2 to MiniMRCluster for Hive Testing
问题描述
我一直希望对我一直在开发的一些代码进行Hive集成测试。我需要测试框架的两个主要要求:
- 它需要使用Cloudera版本的Hive和Hadoop
(最好是2.0.0-cdh4.7.0)
- 它必须是 全部本地 。也就是说,Hadoop集群和Hive
服务器应该在测试开始时启动,运行一些查询,
并在测试结束后拆除。
所以我把这个问题分解为三部分:
$ b $ ol
连接器)
通过查看许多资源,我能够避开(1)。其中一些非常有用的是:
对于(2),我在StackOverflow中遵循了这个优秀的帖子: b
$ b
到目前为止, 太好了。在这个时候,我的Maven项目中的pom.xml包含了上面的两个功能,看起来像这样:
<库>
< repository>
< id> cloudera< / id>
< url> https://repository.cloudera.com/artifactory/cloudera-repos/< / url>
< / repository>
< / repositories>
<依赖关系>
< dependency>
< groupId> commons-io< / groupId>
< artifactId> commons-io< / artifactId>
< version> 2.1< / version>
< /依赖关系>
< dependency>
< groupId> junit< / groupId>
< artifactId> junit< / artifactId>
< version> 4.11< / version>
< /依赖关系>
<! - START:使MiniMRCluster正常工作的依赖关系 - >
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-auth< / artifactId>
< version> 2.0.0-cdh4.7.0< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-test< / artifactId>
< version> 2.0.0-mr1-cdh4.7.0< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-hdfs< / artifactId>
< version> 2.0.0-cdh4.7.0< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-hdfs< / artifactId>
< version> 2.0.0-cdh4.7.0< / version>
< classifier>测试< / classifier>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-common< / artifactId>
< version> 2.0.0-cdh4.7.0< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-common< / artifactId>
< version> 2.0.0-cdh4.7.0< / version>
< classifier>测试< / classifier>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-core< / artifactId>
< version> 2.0.0-mr1-cdh4.7.0< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hadoop< / groupId>
< artifactId> hadoop-core< / artifactId>
< version> 2.0.0-mr1-cdh4.7.0< / version>
< classifier>测试< / classifier>
< /依赖关系>
<! - END:使得MiniMRCluster正常工作的相关性 - >
<! - START:获得Hive JDBC工作的依赖 - >
< dependency>
< groupId> org.apache.hive< / groupId>
< artifactId> hive-builtins< / artifactId>
< version> $ {hive.version}< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hive< / groupId>
< artifactId> hive-cli< / artifactId>
< version> $ {hive.version}< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hive< / groupId>
< artifactId>配置单元 - metastore< / artifactId>
< version> $ {hive.version}< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hive< / groupId>
< artifactId> hive-serde< / artifactId>
< version> $ {hive.version}< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hive< / groupId>
< artifactId> hive-common< / artifactId>
< version> $ {hive.version}< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hive< / groupId>
< artifactId> hive-exec< / artifactId>
< version> $ {hive.version}< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.hive< / groupId>
< artifactId> hive-jdbc< / artifactId>
< version> $ {hive.version}< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.thrift< / groupId>
< artifactId> libfb303< / artifactId>
< version> 0.9.1< / version>
< /依赖关系>
< dependency>
< groupId> log4j< / groupId>
< artifactId> log4j< / artifactId>
< version> 1.2.15< / version>
< /依赖关系>
< dependency>
< groupId> org.antlr< / groupId>
< artifactId> antlr-runtime< / artifactId>
< version> 3.5.1< / version>
< /依赖关系>
< dependency>
< groupId> org.apache.derby< / groupId>
< artifactId> derby< / artifactId>
< version> 10.10.1.1< / version>
< /依赖关系>
< dependency>
< groupId> javax.jdo< / groupId>
< artifactId> jdo2-api< / artifactId>
< version> 2.3-ec< / version>
< /依赖关系>
< dependency>
< groupId> jpox< / groupId>
< artifactId> jpox< / artifactId>
< version> 1.1.9-1< / version>
< /依赖关系>
< dependency>
< groupId> jpox< / groupId>
< artifactId> jpox-rdbms< / artifactId>
< version> 1.2.0-beta-5< / version>
< /依赖关系>
<! - END:用于使Hive JDBC工作的依赖关系 - >
< /依赖关系>
现在我正在执行步骤(3)。我尝试运行以下代码:
@Test $ b $ public void testHiveMiniDFSClusterIntegration()throws IOException,SQLException {
配置conf =新配置();
$ b $ / *构建MiniDFSCluster * /
MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();
$ b $ / *建立MiniMR群集* /
System.setProperty(hadoop.log.dir,/ Users / nishantkelkar / IdeaProjects /+
nkelkar-incubator /蜂房测试/目标/蜂房/日志);
int numTaskTrackers = 1;
int numTaskTrackerDirectories = 1;
String [] racks = null;
String [] hosts = null;
MiniMRCluster miniMR = new MiniMRCluster(numTaskTrackers,miniDFS.getFileSystem().getUri()。toString(),
numTaskTrackerDirectories,机架,主机,新的JobConf(conf));
System.setProperty(mapred.job.tracker,miniMR.createJobConf(
new JobConf(conf))。get(mapred.job.tracker));
尝试{
字符串driverName =org.apache.hive.jdbc.HiveDriver;
Class.forName(driverName);
} catch(ClassNotFoundException e){
e.printStackTrace();
System.exit(1);
}
连接hiveConnection = DriverManager.getConnection(
jdbc:hive2:///,,);
Statement stm = hiveConnection.createStatement();
//现在创建测试表并查询它们
stm.execute(set hive.support.concurrency = false);
stm.execute(drop table if exists test);
stm.execute(如果不存在则创建表test(一个int,b int)行格式以''结尾的分隔字段);
stm.execute(create table dual as select 1 as one from test);
stm.execute(insert into table test select stack(1,4,5)AS(a,b)from dual);
stm.execute(select * from test);
}
我希望(3)可以通过下面的代码行来解决从上面的方法:
连接hiveConnection = DriverManager.getConnection(
jdbc:hive2:///, ,);
但是,我收到以下错误:
java.sql.SQLException:处理语句时出错:FAILED:执行错误,从org.apache.hadoop.hive.ql.exec.DDLTask返回代码1
在org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:161)
在org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:150)
在org.apache .hive.jdbc.HiveStatement.execute(HiveStatement.java:207)
at com.ask.nkelkar.hive.HiveUnitTest.testHiveMiniDFSClusterIntegration(HiveUnitTest.java:54)
任何人都可以让我知道我还需要做些什么以及我做错了什么来完成这项工作?
PS我查看了 HiveRunner 和 hive_test 项目作为选项,但我无法使它们与Cloudera版本的Hadoop配合使用。
/ div>您的测试在第一个 create table
语句中失败。 Hive无意中抑制了以下错误消息:
file:/ user / hive / warehouse / test不是目录或无法创建一个
Hive试图使用默认仓库目录 / user / hive / warehouse
,它不存在于你的文件系统上。你可以创建目录,但是为了测试你可能想要覆盖默认值。例如:
import static org.apache.hadoop.hive.conf.HiveConf.ConfVars;
...
System.setProperty(ConfVars.METASTOREWAREHOUSE.toString(),/ Users / nishantkelkar / IdeaProjects /+
nkelkar-incubator / hive-test / target / hive /仓库);
I've been wanting to do Hive integration testing for some of the code that I've been developing. The two major requirements of the testing framework that I need:
- It needs to work with a Cloudera version of Hive and Hadoop (preferably, 2.0.0-cdh4.7.0)
- It needs to be all local. Meaning, the Hadoop cluster and Hive server should start on the beginning of the test, run a few queries, and teardown after the test is over.
So I broke this problem down into three parts:
- Getting code for the HiveServer2 part (I decided to use a JDBC connector over a Thrift service client)
- Getting code for building an in-memory MapReduce cluster (I decided to use MiniMRCluster for this)
- Setting up both (1) and (2) above to work with each other.
I was able to get (1) out of the way by looking at many resources. Some of these that were very useful are:
For (2), I followed this excellent post in StackOverflow:
So far, so good. At this point of time, my pom.xml in my Maven project, on including both above functionalities, looks something like this:
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
</dependency>
<!-- START: dependencies for getting MiniMRCluster to work -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-auth</artifactId>
<version>2.0.0-cdh4.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-test</artifactId>
<version>2.0.0-mr1-cdh4.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.0.0-cdh4.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.0.0-cdh4.7.0</version>
<classifier>tests</classifier>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.0.0-cdh4.7.0</version>
<classifier>tests</classifier>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.7.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.7.0</version>
<classifier>tests</classifier>
</dependency>
<!-- END: dependencies for getting MiniMRCluster to work -->
<!-- START: dependencies for getting Hive JDBC to work -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-builtins</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-cli</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-common</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.1</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.15</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr-runtime</artifactId>
<version>3.5.1</version>
</dependency>
<dependency>
<groupId>org.apache.derby</groupId>
<artifactId>derby</artifactId>
<version>10.10.1.1</version>
</dependency>
<dependency>
<groupId>javax.jdo</groupId>
<artifactId>jdo2-api</artifactId>
<version>2.3-ec</version>
</dependency>
<dependency>
<groupId>jpox</groupId>
<artifactId>jpox</artifactId>
<version>1.1.9-1</version>
</dependency>
<dependency>
<groupId>jpox</groupId>
<artifactId>jpox-rdbms</artifactId>
<version>1.2.0-beta-5</version>
</dependency>
<!-- END: dependencies for getting Hive JDBC to work -->
</dependencies>
Now I'm on step (3). I tried running the following code:
@Test
public void testHiveMiniDFSClusterIntegration() throws IOException, SQLException {
Configuration conf = new Configuration();
/* Build MiniDFSCluster */
MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build();
/* Build MiniMR Cluster */
System.setProperty("hadoop.log.dir", "/Users/nishantkelkar/IdeaProjects/" +
"nkelkar-incubator/hive-test/target/hive/logs");
int numTaskTrackers = 1;
int numTaskTrackerDirectories = 1;
String[] racks = null;
String[] hosts = null;
MiniMRCluster miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(),
numTaskTrackerDirectories, racks, hosts, new JobConf(conf));
System.setProperty("mapred.job.tracker", miniMR.createJobConf(
new JobConf(conf)).get("mapred.job.tracker"));
try {
String driverName = "org.apache.hive.jdbc.HiveDriver";
Class.forName(driverName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
Connection hiveConnection = DriverManager.getConnection(
"jdbc:hive2:///", "", "");
Statement stm = hiveConnection.createStatement();
// now create test tables and query them
stm.execute("set hive.support.concurrency = false");
stm.execute("drop table if exists test");
stm.execute("create table if not exists test(a int, b int) row format delimited fields terminated by ' '");
stm.execute("create table dual as select 1 as one from test");
stm.execute("insert into table test select stack(1,4,5) AS (a,b) from dual");
stm.execute("select * from test");
}
My hope was that (3) would be solved by the following line of code from the above method:
Connection hiveConnection = DriverManager.getConnection(
"jdbc:hive2:///", "", "");
However, I'm getting the following error:
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:161)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:150)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:207)
at com.ask.nkelkar.hive.HiveUnitTest.testHiveMiniDFSClusterIntegration(HiveUnitTest.java:54)
Can anyone please let me know what I need to do in addition/what I'm doing wrong to get this to work?
P.S. I looked at HiveRunner and hive_test projects as options, but I wasn't able to get these to work with Cloudera versions of Hadoop.
Your test is failing at the first create table
statement. Hive is unhelpfully suppressing the following error message:
file:/user/hive/warehouse/test is not a directory or unable to create one
Hive is attempting to use the default warehouse directory /user/hive/warehouse
which doesn't exist on your filesystem. You could create the directory, but for testing you'll likely want to override the default value. For example:
import static org.apache.hadoop.hive.conf.HiveConf.ConfVars;
...
System.setProperty(ConfVars.METASTOREWAREHOUSE.toString(), "/Users/nishantkelkar/IdeaProjects/" +
"nkelkar-incubator/hive-test/target/hive/warehouse");
这篇关于将HiveServer2指向MiniMRCluster进行Hive测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!