在 Amazon EMR 上从 java 使用 hbase 时遇到问题 [英] Trouble using hbase from java on Amazon EMR

查看:37
本文介绍了在 Amazon EMR 上从 java 使用 hbase 时遇到问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我尝试使用我作为 MapReduce 步骤启动的自定义 jar 在 Amazon ec2 上查询我的 hbase 集群.我是我的 jar(在 map 函数内)我这样调用 Hbase:

So Im trying to query my hbase cluster on Amazon ec2 using a custom jar i launch as a MapReduce step. Im my jar (inside the map function) I call Hbase as so:

public void map( Text key, BytesWritable value, Context contex ) throws IOException, InterruptedException {
    Configuration conf = HBaseConfiguration.create();
    HTable table = new HTable(conf, "tablename");
      ...

问题在于,当它到达该 HTable 行并尝试连接到 hbase 时,该步骤失败并出现以下错误:

the problem is that when it gets to that HTable line and tries to connect to hbase, the step fails and I get the following errors:

2014-02-28 18:00:49,936 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
2014-02-28 18:00:49,974 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 5119@ip-10-0-35-130.ec2.internal
2014-02-28 18:00:49,998 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-02-28 18:00:50,005 WARN [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused

      ...

2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-02-28 18:01:05,542 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
2014-02-28 18:01:05,542 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid

      ... and on and on

我可以很好地使用 hbase shell,并且可以从 shell 查询数据和所有内容.我不知道从哪里开始,我已经在谷歌上搜索了几个小时但没有运气.互联网上的大多数此类问题都没有谈论亚马逊特定的修复程序.我认为zookeeper和hbase应该通过亚马逊引导程序自动正确连接.

I can use the hbase shell just fine, and can query data and everything from the shell. I have no clue where to start and I've been googling for hours with no luck. Most of the problems like this on the internet dont talk about Amazon specific fixes. I thought zookeeper and hbase should automatically be connected properly by the amazon bootstrap.

我正在使用 hbase 0.94.17 jar,而亚马逊正在运行 hbase 0.94.7 我很确定那不是问题,我猜更多是我没有正确设置 Java 代码.如果有人可以对此提供帮助,将不胜感激.谢谢

Im using the hbase 0.94.17 jar and amazon is running hbase 0.94.7 im pretty sure thats not the problem, Im guessing its more me not setting up the Java code correctly. If anyone can help with this itd be greatly appreciated.Thanks

推荐答案

好吧,经过近 30 个小时的尝试,我找到了解决方案.对此有很多警告,版本很重要.

Well, after almost 30 hours of trying I've found the solution. There are many caveats to this, and versions are important.

在这种情况下,我使用 amazon emr hadoop2 (ami 3.0.4) 和 Hbase 0.94.7,我尝试在同一集群上运行自定义 jar 以通过 java 在本地访问 hbase.

In this case Im using amazon emr hadoop2 (ami 3.0.4) with Hbase 0.94.7 and Im trying to run a custom jar on the same cluster to access hbase locally through java.

因此,第一件事是默认的 hbase 配置将无法工作,因为 EC2 面临外部/内部 IP 特性.所以你不能使用 HConfiguration (因为它默认为本地主机法定人数)您需要做的是使用亚马逊为您设置的配置(位于/home/hadoop/hbase/conf/hbase-site.xml),然后手动将其添加到空白配置对象中.

So, the first thing is that the default hbase config will not work because of the external/internal IP idiosynchronicies that EC2 faces. So you cant use HConfiguration (because it defaults to a localhost quorum) What you'll have to do is use the configuration that amazon sets up for you (located in /home/hadoop/hbase/conf/hbase-site.xml) and just manually add it to a blank configuration object.

连接代码如下所示:

Configuration conf = new Configuration();
conf.addResource("/home/hadoop/hbase/conf/hbase-site.xml");
HBaseAdmin.checkHBaseAvailable(conf);

其次,您必须在自定义 jar 中使用正确的 hbase jar PACKAGED.原因是因为hbase 94.x 是默认为hadoop1 编译的,所以你必须抓取名为hbase-0.94.6-cdh4.3.0.jar 的cloudera hbase jar(你可以在网上找到),它是针对hadoop2 编译的.如果你不做这部分,你会得到许多讨厌的、无法通过谷歌搜索的错误,包括 org.apache.hadoop.net.NetUtils 异常.

Secondly, you have to use the correct hbase jar PACKAGED into your custom jar. The reason is because hbase 94.x is compiled by default for hadoop1, so you have to grab the cloudera hbase jar named hbase-0.94.6-cdh4.3.0.jar (you can find this online) which has been compiled against hadoop2. If you don't do this part you will get many nasty, un-googleable errors including the org.apache.hadoop.net.NetUtils exception.

这篇关于在 Amazon EMR 上从 java 使用 hbase 时遇到问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆