无法从HBase导出表 [英] Unable to export a table from HBase

查看:175
本文介绍了无法从HBase导出表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法将表格从HBase导出到HDFS中。以下是错误跟踪。这是相当大的尺寸。有没有其他方法可以导出它?



我使用下面的命令导出。

  sudo -u hdfs hbase -Dhbase.rpc.timeout = 1000000 org.apache。 hadoop.hbase.mapreduce.Export My_Table / hdfs_path 

15/05/05 08:50:27信息mapreduce.Job:map 0%reduce 0%
15/05/05 08: 50:55 INFO mapreduce.Job:任务标识:attempt_1424936551928_0234_m_000001_0,状态:失败
错误:org.apache.hadoop.hbase.DoNotRetryIOException:OutOfOrderScannerNextException的重试后失败:有没有一个RPC超时?在org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:410)

在org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:230)
在org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)$ b $在org.apache.hadoop.mapred.MapTask $ NewTrackingRecordReader.nextKeyValue b(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper $ Context.nextKeyValue(WrappedMapper。 Java的:在org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144 91)

在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)在org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

。在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:168)
在java.security.AccessController.doPrivileged(Native Me的ThOD)
在javax.security.auth.Subject.doAs(Subject.java:415)
在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
产生的原因:org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:预期nextCallSeq:1但下一个CallSeq从客户端获得:0; request = scanner_id:229 number_of_rows:100 close_scanner:false next_call_seq:0
位于org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
位于org.apache.hadoop。 hbase.protobuf.generated.ClientProtos $ ClientService $ 2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org .apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org在java.lang.Thread.run .apache.hadoop.hbase.ipc.RpcExecutor $ 1.run(RpcExecutor.java:96)
(Thread.java:745)

在阳光下。 reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)
在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAcces (java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304)
。在org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
在org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)$ b在org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries $ b(RpcRetryingCaller.java:114)在org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries
(RpcRetryingCaller.java:90)$ b在org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
$ b ... 13个
引起的:org.apache.hadoop.hbase.ipc.RemoteWithExtrasException (org.apac he.hadoop.hbase.exceptions.OutOfOrderScannerNextException):org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:预计nextCallSeq:1但是nextCallSeq从客户端获得:0; request = scanner_id:229 number_of_rows:100 close_scanner:false next_call_seq:0
位于org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
位于org.apache.hadoop。 hbase.protobuf.generated.ClientProtos $ ClientService $ 2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org .apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org .apache.hadoop.hbase.ipc.RpcExecutor $ 1.run(RpcExecutor.java:96)
at java.lang.Thread.run(Thread.java:745)

at org。 apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org。 apache.hadoop.hbase.ipc.RpcClient $ BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java: 1719)
在org.apache.hadoop.hbase.protobuf.generated.ClientProtos $ ClientService $ BlockingStub.scan(ClientProtos.java:30328)
位于org.apache.hadoop.hbase.client.ScannerCallable。 call(ScannerCallable.java:174)
... 17 more


解决方案



如果表格真的很大,这里有一些 您可以通过查看导出命令
的代码来尝试,您可以调整缓存大小,应用扫描过滤器 p>

请参阅下面 Export api from hbase



请参阅使用命令:它提供了更多选项。



以我的经验 cachesize (不是批量大小=列数)和/或
自定义过滤条件应该适合你。
例如:如果您的密钥从0开始,其中0是区域名称,则首先通过指定过滤器
然后输入下一个区域数据...等等来导出这些行。下面是ExportFilter片断,通过它你可以理解它是如何工作的。

  private static Filter getExportFilter(String [] args){ 
138过滤器exportFilter = null;
139 String filterCriteria =(args.length> 5)? args [5]:null;
140 if(filterCriteria == null)return null;
141 if(filterCriteria.startsWith(^)){
142 String regexPattern = filterCriteria.substring(1,filterCriteria.length());
143 exportFilter = new RowFilter(CompareOp.EQUAL,new RegexStringComparator(regexPattern));
144} else {
145 exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria));
146}
147 return exportFilter;
148}

/ *
151 * @param errorMsg错误信息。可以为null。
152 * /
153 private static void usage(final String errorMsg){
154 if(errorMsg!= null&&& errorMsg.length()> 0){
155 System.err.println(ERROR:+ errorMsg);
156}
157 System.err.println(用法:导出[-D< property = value>] *< tablename>< outputdir> [< versions> +
158[< starttime> [< endtime>]] [^ [regex pattern]或[Prefix] to filter]] \ n);
System.err.println(注意:-D属性将应用于使用的conf。);
160 System.err.println(例如:);
161 System.err.println(-D mapreduce.output.fileoutputformat.compress = true);
162 System.err.println(-D mapreduce.output.fileoutputformat.compress.codec = org.apache.hadoop.io.compress.GzipCodec);
163 System.err.println(-D mapreduce.output.fileoutputformat.compress.type = BLOCK);
164 System.err.println(此外,可以指定以下扫描属性);
165 System.err.println(控制/限制导出的内容);
166 System.err.println(-D+ TableInputFormat.SCAN_COLUMN_FAMILY +=< familyName>);
167 System.err.println(-D+ RAW_SCAN += true);
168 System.err.println(-D+ TableInputFormat.SCAN_ROW_START +=< ROWSTART>);
169 System.err.println(-D+ TableInputFormat.SCAN_ROW_STOP +=< ROWSTOP>);
170 System.err.println(-D+ JOB_NAME_CONF_KEY
171 += jobName - 使用指定的mapreduce作业名称进行导出);
172 System.err.println(对于性能请考虑以下属性:\ n
173 +-Dhbase.client.scanner.caching = 100 \\\

174 + -Dmapreduce.map.speculative = false \\\

175 +-Dmapreduce.reduce.speculative = false);
176 System.err.println(对于行宽很大的表,可以考虑设置批量大小,如下所示:\ n
177 +-D+ EXPORT_BATCHING += 10);
178}


I am unable to export a table from HBase into HDFS. Below is the error trace. It is quite of big size. Are there any other ways to export it?

I used below command to export. I increase rpc timeout but still job failed.

sudo -u hdfs hbase -Dhbase.rpc.timeout=1000000 org.apache.hadoop.hbase.mapreduce.Export My_Table /hdfs_path

15/05/05 08:50:27 INFO mapreduce.Job:  map 0% reduce 0%
15/05/05 08:50:55 INFO mapreduce.Job: Task Id : attempt_1424936551928_0234_m_000001_0, Status : FAILED
Error: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
        at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:410)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:230)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
        at java.lang.Thread.run(Thread.java:745)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
        ... 13 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
        at java.lang.Thread.run(Thread.java:745)

        at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
        at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
        at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:30328)
        at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
        ... 17 more

解决方案

I'd suggest to look at the code and do a phase wise export.

If table is really large, Here are some tips which you could try by seeing the code of Export command you can adjust cache size, apply scan filter

please see below Export api from hbase

please see usage command : which gives you more options.

With my experience cachesize (not batch size = number of columns at time)and or
custom filter condition should work for you. For ex : if your key starts like 0_ where 0 is region name first export those rows by specifying the filter and then next region data... so on. below is the ExportFilter snippet through which you can understand how it works..

  private static Filter getExportFilter(String[] args) { 
138     Filter exportFilter = null; 
139     String filterCriteria = (args.length > 5) ? args[5]: null; 
140     if (filterCriteria == null) return null; 
141     if (filterCriteria.startsWith("^")) { 
142       String regexPattern = filterCriteria.substring(1, filterCriteria.length()); 
143       exportFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(regexPattern)); 
144     } else { 
145       exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria)); 
146     } 
147     return exportFilter; 
148   } 

/* 
151    * @param errorMsg Error message.  Can be null. 
152    */ 
153   private static void usage(final String errorMsg) { 
154     if (errorMsg != null && errorMsg.length() > 0) { 
155       System.err.println("ERROR: " + errorMsg); 
156     } 
157     System.err.println("Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " + 
158       "[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]\n"); 
159     System.err.println("  Note: -D properties will be applied to the conf used. "); 
160     System.err.println("  For example: "); 
161     System.err.println("   -D mapreduce.output.fileoutputformat.compress=true"); 
162     System.err.println("   -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec"); 
163     System.err.println("   -D mapreduce.output.fileoutputformat.compress.type=BLOCK"); 
164     System.err.println("  Additionally, the following SCAN properties can be specified"); 
165     System.err.println("  to control/limit what is exported.."); 
166     System.err.println("   -D " + TableInputFormat.SCAN_COLUMN_FAMILY + "=<familyName>"); 
167     System.err.println("   -D " + RAW_SCAN + "=true"); 
168     System.err.println("   -D " + TableInputFormat.SCAN_ROW_START + "=<ROWSTART>"); 
169     System.err.println("   -D " + TableInputFormat.SCAN_ROW_STOP + "=<ROWSTOP>"); 
170     System.err.println("   -D " + JOB_NAME_CONF_KEY 
171         + "=jobName - use the specified mapreduce job name for the export"); 
172     System.err.println("For performance consider the following properties:\n" 
173         + "   -Dhbase.client.scanner.caching=100\n" 
174         + "   -Dmapreduce.map.speculative=false\n" 
175         + "   -Dmapreduce.reduce.speculative=false"); 
176     System.err.println("For tables with very wide rows consider setting the batch size as below:\n" 
177         + "   -D" + EXPORT_BATCHING + "=10"); 
178   } 

这篇关于无法从HBase导出表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆