由于 hbase 客户端 jar 中的硬编码 managed=true,无法连接到 Bigtable 以扫描 HTable 数据 [英] Can't connect to Bigtable to scan HTable data due to hardcoded managed=true in hbase client jars

查看:29
本文介绍了由于 hbase 客户端 jar 中的硬编码 managed=true,无法连接到 Bigtable 以扫描 HTable 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究自定义加载函数,以使用 Dataproc 上的 Pig 从 Bigtable 加载数据.我使用从 Dataproc 获取的以下 jar 文件列表编译我的 java 代码.当我运行以下 Pig 脚本时,它在尝试与 Bigtable 建立连接时失败.

I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following Pig script, it fails when it tries to establish a connection with Bigtable.

错误信息是:

Bigtable does not support managed connections.

问题:

  1. 是否有解决此问题的方法?
  2. 这是一个已知问题吗?是否有修复或调整计划?
  3. 是否有一种不同的方式来实现多重扫描作为 Pig 的加载功能,可以与 Bigtable 一起使用?

详情:

Jar 文件:

hadoop-common-2.7.3.jar 
hbase-client-1.2.2.jar
hbase-common-1.2.2.jar
hbase-protocol-1.2.2.jar
hbase-server-1.2.2.jar
pig-0.16.0-core-h2.jar

这是一个使用我的自定义加载函数的简单 Pig 脚本:

Here's a simple Pig script using my custom load function:

%default gte         '2017-03-23T18:00Z'
%default lt          '2017-03-23T18:05Z'
%default SHARD_FIRST '00'
%default SHARD_LAST  '25'
%default GTE_SHARD   '$gte\_$SHARD_FIRST'
%default LT_SHARD    '$lt\_$SHARD_LAST'
raw = LOAD 'hbase://events_sessions'
      USING com.eduboom.pig.load.HBaseMultiScanLoader('$GTE_SHARD', '$LT_SHARD', 'event:*')
      AS (es_key:chararray, event_array);
DUMP raw;

我的自定义加载函数 HBaseMultiScanLoader 创建了一个 Scan 对象列表,以对表 events_sessions 中不同范围的数据执行多次扫描,该数据范围由 gte 和 lt 之间的时间范围决定,并由 SHARD_FIRST 到 SHARD_LAST 分片.

My custom load function HBaseMultiScanLoader creates a list of Scan objects to perform multiple scans on different ranges of data in the table events_sessions determined by the time range between gte and lt and sharded by SHARD_FIRST through SHARD_LAST.

HBaseMultiScanLoader 扩展了 org.apache.pig.LoadFunc,因此它可以在 Pig 脚本中用作加载函数.当 Pig 运行我的脚本时,它会调用 LoadFunc.getInputFormat().我的 getInputFormat() 实现返回了我的自定义类 MultiScanTableInputFormat 的一个实例,它扩展了 org.apache.hadoop.mapreduce.InputFormat.MultiScanTableInputFormat 初始化 org.apache.hadoop.hbase.client.HTable 对象来初始化与表的连接.

HBaseMultiScanLoader extends org.apache.pig.LoadFunc so it can be used in the Pig script as load function. When Pig runs my script, it calls LoadFunc.getInputFormat(). My implementation of getInputFormat() returns an instance of my custom class MultiScanTableInputFormat which extends org.apache.hadoop.mapreduce.InputFormat. MultiScanTableInputFormat initializes org.apache.hadoop.hbase.client.HTable object to initialize the connection to the table.

深入研究 hbase-client 源代码,我看到 org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal() 使用属性调用 org.apache.hadoop.hbase.client.ConnectionManager.createConnection()"管理"硬编码为真".您可以从下面的堆栈跟踪中看到,我的代码 (MultiScanTableInputFormat) 尝试初始化一个 HTable 对象,该对象调用 getConnectionInternal() ,该对象不提供将 managed 设置为 false 的选项.沿着堆栈跟踪向下,您将到达 AbstractBigtableConnection,它不会接受 managed=true 并因此导致与 Bigtable 的连接失败.

Digging into the hbase-client source code, I see that org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal() calls org.apache.hadoop.hbase.client.ConnectionManager.createConnection() with the attribute "managed" hardcoded to "true". You can see from the stack track below that my code (MultiScanTableInputFormat) tries to initialize an HTable object which invokes getConnectionInternal() which does not provide an option to set managed to false. Going down the stack trace, you will get to AbstractBigtableConnection that will not accept managed=true and therefore cause the connection to Bigtable to fail.

这是显示错误的堆栈跟踪:

Here’s the stack trace showing the error:

2017-03-24 23:06:44,890 [JobControl] ERROR com.turner.hbase.mapreduce.MultiScanTableInputFormat - java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:424)
    at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:302)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:185)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:151)
    at com.eduboom.hbase.mapreduce.MultiScanTableInputFormat.setConf(Unknown Source)
    at com.eduboom.pig.load.HBaseMultiScanLoader.getInputFormat(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:264)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    ... 26 more
Caused by: java.lang.IllegalArgumentException: Bigtable does not support managed connections.
    at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:123)
    at com.google.cloud.bigtable.hbase1_2.BigtableConnection.<init>(BigtableConnection.java:55)
    ... 31 more

推荐答案

最初的问题是由于使用过时且已弃用的 hbase 客户端 jar 和类造成的.

The original problem was caused by the use of outdated and deprecated hbase client jars and classes.

我更新了我的代码以使用 Google 提供的最新 hbase 客户端 jar,原来的问题已经解决.

I updated my code to use the newest hbase client jars provided by Google and the original problem was fixed.

我仍然被一些我仍然没有弄清楚的 ZK 问题所困扰,但这是针对不同问题的对话.

I still get stuck with some ZK issue that I still did not figure out, but that's a conversation for a different question.

这个回答了!

这篇关于由于 hbase 客户端 jar 中的硬编码 managed=true,无法连接到 Bigtable 以扫描 HTable 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆