无法从HBase导出表 [英] Unable to export a table from HBase
问题描述
我无法将表格从HBase导出到HDFS中。以下是错误跟踪。这是相当大的尺寸。有没有其他方法可以导出它?
我使用下面的命令导出。
sudo -u hdfs hbase -Dhbase.rpc.timeout = 1000000 org.apache。 hadoop.hbase.mapreduce.Export My_Table / hdfs_path
15/05/05 08:50:27信息mapreduce.Job:map 0%reduce 0%
15/05/05 08: 50:55 INFO mapreduce.Job:任务标识:attempt_1424936551928_0234_m_000001_0,状态:失败
错误:org.apache.hadoop.hbase.DoNotRetryIOException:OutOfOrderScannerNextException的重试后失败:有没有一个RPC超时?在org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:410)
在org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:230)
在org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)$ b $在org.apache.hadoop.mapred.MapTask $ NewTrackingRecordReader.nextKeyValue b(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper $ Context.nextKeyValue(WrappedMapper。 Java的:在org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144 91)
)
在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)在org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
。在org.apache.hadoop.mapred.YarnChild $ 2.run(YarnChild.java:168)
在java.security.AccessController.doPrivileged(Native Me的ThOD)
在javax.security.auth.Subject.doAs(Subject.java:415)
在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
在org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
产生的原因:org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:预期nextCallSeq:1但下一个CallSeq从客户端获得:0; request = scanner_id:229 number_of_rows:100 close_scanner:false next_call_seq:0
位于org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
位于org.apache.hadoop。 hbase.protobuf.generated.ClientProtos $ ClientService $ 2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org .apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org在java.lang.Thread.run .apache.hadoop.hbase.ipc.RpcExecutor $ 1.run(RpcExecutor.java:96)
(Thread.java:745)
在阳光下。 reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)
在sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)在sun.reflect.DelegatingConstructorAccessorImpl.newInstance
(DelegatingConstructorAcces (java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304)
。在org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
在org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)$ b在org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries $ b(RpcRetryingCaller.java:114)在org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries
(RpcRetryingCaller.java:90)$ b在org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
$ b ... 13个
引起的:org.apache.hadoop.hbase.ipc.RemoteWithExtrasException (org.apac he.hadoop.hbase.exceptions.OutOfOrderScannerNextException):org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:预计nextCallSeq:1但是nextCallSeq从客户端获得:0; request = scanner_id:229 number_of_rows:100 close_scanner:false next_call_seq:0
位于org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
位于org.apache.hadoop。 hbase.protobuf.generated.ClientProtos $ ClientService $ 2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org .apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org .apache.hadoop.hbase.ipc.RpcExecutor $ 1.run(RpcExecutor.java:96)
at java.lang.Thread.run(Thread.java:745)
at org。 apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org。 apache.hadoop.hbase.ipc.RpcClient $ BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java: 1719)
在org.apache.hadoop.hbase.protobuf.generated.ClientProtos $ ClientService $ BlockingStub.scan(ClientProtos.java:30328)
位于org.apache.hadoop.hbase.client.ScannerCallable。 call(ScannerCallable.java:174)
... 17 more
如果表格真的很大,这里有一些 您可以通过查看 请参阅下面 Export api from hbase 请参阅使用命令:它提供了更多选项。 以我的经验 I am unable to export a table from HBase into HDFS. Below is the error trace. It is quite of big size. Are there any other ways to export it? I used below command to export. I increase rpc timeout but still job failed.
If table is really large, Here are some tips which you could try by seeing the code of please see below Export api from hbase please see usage command : which gives you more options. With my experience
这篇关于无法从HBase导出表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!导出
命令
的代码来尝试,您可以调整缓存大小,应用扫描过滤器 p>
cachesize
(不是批量大小=列数)和/或
自定义过滤条件应该适合你。
例如:如果您的密钥从0开始,其中0是区域名称,则首先通过指定过滤器
然后输入下一个区域数据...等等来导出这些行。下面是ExportFilter片断,通过它你可以理解它是如何工作的。
private static Filter getExportFilter(String [] args){
138过滤器exportFilter = null;
139 String filterCriteria =(args.length> 5)? args [5]:null;
140 if(filterCriteria == null)return null;
141 if(filterCriteria.startsWith(^)){
142 String regexPattern = filterCriteria.substring(1,filterCriteria.length());
143 exportFilter = new RowFilter(CompareOp.EQUAL,new RegexStringComparator(regexPattern));
144} else {
145 exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria));
146}
147 return exportFilter;
148}
/ *
151 * @param errorMsg错误信息。可以为null。
152 * /
153 private static void usage(final String errorMsg){
154 if(errorMsg!= null&&& errorMsg.length()> 0){
155 System.err.println(ERROR:+ errorMsg);
156}
157 System.err.println(用法:导出[-D< property = value>] *< tablename>< outputdir> [< versions> +
158[< starttime> [< endtime>]] [^ [regex pattern]或[Prefix] to filter]] \ n);
System.err.println(注意:-D属性将应用于使用的conf。);
160 System.err.println(例如:);
161 System.err.println(-D mapreduce.output.fileoutputformat.compress = true);
162 System.err.println(-D mapreduce.output.fileoutputformat.compress.codec = org.apache.hadoop.io.compress.GzipCodec);
163 System.err.println(-D mapreduce.output.fileoutputformat.compress.type = BLOCK);
164 System.err.println(此外,可以指定以下扫描属性);
165 System.err.println(控制/限制导出的内容);
166 System.err.println(-D+ TableInputFormat.SCAN_COLUMN_FAMILY +=< familyName>);
167 System.err.println(-D+ RAW_SCAN += true);
168 System.err.println(-D+ TableInputFormat.SCAN_ROW_START +=< ROWSTART>);
169 System.err.println(-D+ TableInputFormat.SCAN_ROW_STOP +=< ROWSTOP>);
170 System.err.println(-D+ JOB_NAME_CONF_KEY
171 += jobName - 使用指定的mapreduce作业名称进行导出);
172 System.err.println(对于性能请考虑以下属性:\ n
173 +-Dhbase.client.scanner.caching = 100 \\\
174 + -Dmapreduce.map.speculative = false \\\
175 +-Dmapreduce.reduce.speculative = false);
176 System.err.println(对于行宽很大的表,可以考虑设置批量大小,如下所示:\ n
177 +-D+ EXPORT_BATCHING += 10);
178}
sudo -u hdfs hbase -Dhbase.rpc.timeout=1000000 org.apache.hadoop.hbase.mapreduce.Export My_Table /hdfs_path
15/05/05 08:50:27 INFO mapreduce.Job: map 0% reduce 0%
15/05/05 08:50:55 INFO mapreduce.Job: Task Id : attempt_1424936551928_0234_m_000001_0, Status : FAILED
Error: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:410)
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:230)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
at java.lang.Thread.run(Thread.java:745)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:304)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:204)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
... 13 more
Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException): org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 229 number_of_rows: 100 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3198)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29925)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:116)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:96)
at java.lang.Thread.run(Thread.java:745)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1457)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:30328)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:174)
... 17 more
I'd suggest to look at the code and do a phase wise export.
Export
command
you can adjust cache size, apply scan filtercachesize
(not batch size = number of columns at time)and or
custom filter condition should work for you.
For ex : if your key starts like 0_ where 0 is region name first export those rows by specifying the filter
and then next region data... so on. below is the ExportFilter snippet through which you can understand how it works.. private static Filter getExportFilter(String[] args) {
138 Filter exportFilter = null;
139 String filterCriteria = (args.length > 5) ? args[5]: null;
140 if (filterCriteria == null) return null;
141 if (filterCriteria.startsWith("^")) {
142 String regexPattern = filterCriteria.substring(1, filterCriteria.length());
143 exportFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator(regexPattern));
144 } else {
145 exportFilter = new PrefixFilter(Bytes.toBytesBinary(filterCriteria));
146 }
147 return exportFilter;
148 }
/*
151 * @param errorMsg Error message. Can be null.
152 */
153 private static void usage(final String errorMsg) {
154 if (errorMsg != null && errorMsg.length() > 0) {
155 System.err.println("ERROR: " + errorMsg);
156 }
157 System.err.println("Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> " +
158 "[<starttime> [<endtime>]] [^[regex pattern] or [Prefix] to filter]]\n");
159 System.err.println(" Note: -D properties will be applied to the conf used. ");
160 System.err.println(" For example: ");
161 System.err.println(" -D mapreduce.output.fileoutputformat.compress=true");
162 System.err.println(" -D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec");
163 System.err.println(" -D mapreduce.output.fileoutputformat.compress.type=BLOCK");
164 System.err.println(" Additionally, the following SCAN properties can be specified");
165 System.err.println(" to control/limit what is exported..");
166 System.err.println(" -D " + TableInputFormat.SCAN_COLUMN_FAMILY + "=<familyName>");
167 System.err.println(" -D " + RAW_SCAN + "=true");
168 System.err.println(" -D " + TableInputFormat.SCAN_ROW_START + "=<ROWSTART>");
169 System.err.println(" -D " + TableInputFormat.SCAN_ROW_STOP + "=<ROWSTOP>");
170 System.err.println(" -D " + JOB_NAME_CONF_KEY
171 + "=jobName - use the specified mapreduce job name for the export");
172 System.err.println("For performance consider the following properties:\n"
173 + " -Dhbase.client.scanner.caching=100\n"
174 + " -Dmapreduce.map.speculative=false\n"
175 + " -Dmapreduce.reduce.speculative=false");
176 System.err.println("For tables with very wide rows consider setting the batch size as below:\n"
177 + " -D" + EXPORT_BATCHING + "=10");
178 }