由于hbase客户端jar中的硬编码Managed = true,因此无法连接到Bigtable来扫描HTable数据 [英] Can't connect to Bigtable to scan HTable data due to hardcoded managed=true in hbase client jars

查看:113
本文介绍了由于hbase客户端jar中的硬编码Managed = true,因此无法连接到Bigtable来扫描HTable数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用自定义加载功能,以便在Dataproc上使用Pig来从Bigtable加载数据.我使用从Dataproc抓取的jar文件的以下列表来编译Java代码.当我运行以下Pig脚本时,尝试与Bigtable建立连接时失败.

I'm working on a custom load function to load data from Bigtable using Pig on Dataproc. I compile my java code using the following list of jar files I grabbed from Dataproc. When I run the following Pig script, it fails when it tries to establish a connection with Bigtable.

错误消息是:

Bigtable does not support managed connections.

问题:

  1. 是否可以解决此问题?
  2. 这是一个已知问题吗,有计划修复或调整吗?
  3. 是否有另一种方法可以将多次扫描实现为与Bigtable一起使用的Pig的加载功能?

详细信息:

Jar文件:

hadoop-common-2.7.3.jar 
hbase-client-1.2.2.jar
hbase-common-1.2.2.jar
hbase-protocol-1.2.2.jar
hbase-server-1.2.2.jar
pig-0.16.0-core-h2.jar

这是一个使用我的自定义加载函数的简单Pig脚本:

Here's a simple Pig script using my custom load function:

%default gte         '2017-03-23T18:00Z'
%default lt          '2017-03-23T18:05Z'
%default SHARD_FIRST '00'
%default SHARD_LAST  '25'
%default GTE_SHARD   '$gte\_$SHARD_FIRST'
%default LT_SHARD    '$lt\_$SHARD_LAST'
raw = LOAD 'hbase://events_sessions'
      USING com.eduboom.pig.load.HBaseMultiScanLoader('$GTE_SHARD', '$LT_SHARD', 'event:*')
      AS (es_key:chararray, event_array);
DUMP raw;

我的自定义加载函数HBaseMultiScanLoader创建了一个Scan对象列表,以对表event_sessions中的不同数据范围执行多次扫描,这些事件由gte和lt之间的时间范围确定,并由SHARD_FIRST到SHARD_LAST进行分片.

My custom load function HBaseMultiScanLoader creates a list of Scan objects to perform multiple scans on different ranges of data in the table events_sessions determined by the time range between gte and lt and sharded by SHARD_FIRST through SHARD_LAST.

HBaseMultiScanLoader扩展了org.apache.pig.LoadFunc,因此可以在Pig脚本中用作加载函数.Pig运行我的脚本时,它将调用LoadFunc.getInputFormat().我的getInputFormat()实现返回了自定义类MultiScanTableInputFormat的实例,该实例扩展了org.apache.hadoop.mapreduce.InputFormat.MultiScanTableInputFormat初始化org.apache.hadoop.hbase.client.HTable对象以初始化与表的连接.

HBaseMultiScanLoader extends org.apache.pig.LoadFunc so it can be used in the Pig script as load function. When Pig runs my script, it calls LoadFunc.getInputFormat(). My implementation of getInputFormat() returns an instance of my custom class MultiScanTableInputFormat which extends org.apache.hadoop.mapreduce.InputFormat. MultiScanTableInputFormat initializes org.apache.hadoop.hbase.client.HTable object to initialize the connection to the table.

深入研究hbase客户端源代码,我看到org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal()调用org.apache.hadoop.hbase.client.ConnectionManager.createConnection()托管"硬编码为真".您可以从下面的堆栈轨道中看到,我的代码(MultiScanTableInputFormat)尝试初始化一个HTable对象,该对象调用getConnectionInternal(),该对象未提供将managed设置为false的选项.沿着堆栈跟踪,您将到达AbstractBigtableConnection,它将不接受managed = true,因此会导致与Bigtable的连接失败.

Digging into the hbase-client source code, I see that org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal() calls org.apache.hadoop.hbase.client.ConnectionManager.createConnection() with the attribute "managed" hardcoded to "true". You can see from the stack track below that my code (MultiScanTableInputFormat) tries to initialize an HTable object which invokes getConnectionInternal() which does not provide an option to set managed to false. Going down the stack trace, you will get to AbstractBigtableConnection that will not accept managed=true and therefore cause the connection to Bigtable to fail.

这是显示错误的堆栈跟踪:

Here’s the stack trace showing the error:

2017-03-24 23:06:44,890 [JobControl] ERROR com.turner.hbase.mapreduce.MultiScanTableInputFormat - java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:424)
    at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:302)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:185)
    at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:151)
    at com.eduboom.hbase.mapreduce.MultiScanTableInputFormat.setConf(Unknown Source)
    at com.eduboom.pig.load.HBaseMultiScanLoader.getInputFormat(Unknown Source)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:264)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    ... 26 more
Caused by: java.lang.IllegalArgumentException: Bigtable does not support managed connections.
    at org.apache.hadoop.hbase.client.AbstractBigtableConnection.<init>(AbstractBigtableConnection.java:123)
    at com.google.cloud.bigtable.hbase1_2.BigtableConnection.<init>(BigtableConnection.java:55)
    ... 31 more

推荐答案

最初的问题是由于使用了过时和过时的hbase客户端jar和类引起的.

The original problem was caused by the use of outdated and deprecated hbase client jars and classes.

我更新了代码,以使用Google提供的最新的hbase客户端jar,并解决了原始问题.

I updated my code to use the newest hbase client jars provided by Google and the original problem was fixed.

我仍然遇到一些ZK问题,但我仍然没有弄清楚,但这是另一个问题的对话.

I still get stuck with some ZK issue that I still did not figure out, but that's a conversation for a different question.

这个回答了!

这篇关于由于hbase客户端jar中的硬编码Managed = true,因此无法连接到Bigtable来扫描HTable数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆