将数据从一个hbase表复制到另一个 [英] Copy Data from one hbase table to another

查看:156
本文介绍了将数据从一个hbase表复制到另一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个hivetest表,它也创建了名为'hbasetest'的hbase表。现在我想用相同的模式将'hbasetest'数据复制到另一个hbase表(如logdata)中。那么,谁能帮助我如何将数据从'hbasetest'复制到'logdata'而不使用配置单元。

hivetest(cookie字符串,timespent字符串,pageviews字符串,访问字符串,logdate字符串)
STORED BY'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES(hbase.columns.mapping =m:timespent,m:综合浏览量,m:访问量,m:logdate)
TBLPROPERTIES(hbase.table.name=hbasetest);

更新后的问题:

我已经创建了像这样的表logdata。但是,我收到以下错误。

 创建'logdata',{NAME => 'm',BLOOMFILTER => 'NONE',REPLICATION_SCOPE => '0',VERSIONS => '3',COMPRESSION => 'NONE',MIN_VERSIONS =>'0',TTL => '2147483647',BLOCKSIZE => '65536',IN_MEMORY => 'false',BLOCKCACHE => 'true'} 

13/09/23 12:57:19信息mapred.JobClient:Task Id:attempt_201309231115_0025_m_000000_0,Status:FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException :755行为失败:org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:列族m不存在于区域logdata ,, 1379920697845.30fce8bcc99bf9ed321720496a3ec498。在'logdata'表中,{NAME => 'm',DATA_BLOCK_ENCODING => 'NONE',BLOOMFILTER => 'NONE',REPLICATION_SCOPE => '0',COMPRESSION => 'NONE',VERSIONS => '3',TTL => '2147483647',MIN_VERSIONS => '0',KEEP_DELETED_CELLS => '假',BLOCKSIZE => '65536',ENCODE_ON_DISK => 'true',IN_MEMORY => 'false',BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method。
at org.apache.hadoop.hbase.ipc.WritableRpcEngine $ Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer $ Handler。运行(HBaseServer.java:1426)
:755次,服务器出现问题:master:60020,
位于org.apache.hadoop.hbase.client.HConnectionManager $ HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674 )
at org.apache.hadoop.hbase.client.HConnectionManager $ HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(H Table.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat $ TableRecordWriter。关闭(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper( MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java :255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop .security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12 :57:29信息mapred.JobClient:任务ID:attempt_201309231115_0025_m_000000_1,状态:FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:失败7 55个操作:org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:列族m在区域logdata中不存在,, 1379920697845.30fce8bcc99bf9ed321720496a3ec498。在'logdata'表中,{NAME => 'm',DATA_BLOCK_ENCODING => 'NONE',BLOOMFILTER => 'NONE',REPLICATION_SCOPE => '0',COMPRESSION => 'NONE',VERSIONS => '3',TTL => '2147483647',MIN_VERSIONS => '0',KEEP_DELETED_CELLS => '假',BLOCKSIZE => '65536',ENCODE_ON_DISK => 'true',IN_MEMORY => 'false',BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method。
at org.apache.hadoop.hbase.ipc.WritableRpcEngine $ Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer $ Handler。运行(HBaseServer.java:1426)
:755次,服务器出现问题:master:60020,
位于org.apache.hadoop.hbase.client.HConnectionManager $ HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674 )
at org.apache.hadoop.hbase.client.HConnectionManager $ HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(H Table.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat $ TableRecordWriter。关闭(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper( MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java :255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop .security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12 :57:38信息mapred.JobClient:任务ID:attempt_201309231115_0025_m_000000_2,状态:FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:失败7 55个操作:org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException:列族m在区域logdata中不存在,, 1379920697845.30fce8bcc99bf9ed321720496a3ec498。在'logdata'表中,{NAME => 'm',DATA_BLOCK_ENCODING => 'NONE',BLOOMFILTER => 'NONE',REPLICATION_SCOPE => '0',COMPRESSION => 'NONE',VERSIONS => '3',TTL => '2147483647',MIN_VERSIONS => '0',KEEP_DELETED_CELLS => '假',BLOCKSIZE => '65536',ENCODE_ON_DISK => 'true',IN_MEMORY => 'false',BLOCKCACHE => 'true'}
at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method。
at org.apache.hadoop.hbase.ipc.WritableRpcEngine $ Server.call(WritableRpcEngine.java:320)
at org.apache.hadoop.hbase.ipc.HBaseServer $ Handler。运行(HBaseServer.java:1426)
:755次,服务器出现问题:master:60020,
位于org.apache.hadoop.hbase.client.HConnectionManager $ HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674 )
at org.apache.hadoop.hbase.client.HConnectionManager $ HConnectionImplementation.processBatch(HConnectionManager.java:1450)
at org.apache.hadoop.hbase.client.HTable.flushCommits(H Table.java:916)
at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat $ TableRecordWriter。关闭(TableOutputFormat.java:109)
at org.apache.hadoop.mapred.MapTask $ NewDirectOutputCollector.close(MapTask.java:651)
at org.apache.hadoop.mapred.MapTask.runNewMapper( MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java :255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop .security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12 :57:53信息mapred.JobClient:工作完成:job_201309231115_0025
13/09/23 12:57:53信息mapred.JobClient:计数器:7
13/09/23 12:57:53信息mapred.JobClient:作业计数器
13/09/23 12:57:53信息mapred.JobClient:SLOTS_MILLIS_MAPS = 34605
13/09/23 12:57:53信息mapred.JobClient:所花费的总时间减少等待后预留插槽(ms)= 0
13/09/23 12:57:53信息mapred.JobClient:预留插槽后等待的所有地图花费的总时间(毫秒)= 0
13/09/23 12:57:53信息mapred.JobClient:机架本地地图任务= 4
13/09/23 12:57:53信息mapred.JobClient:启动地图任务= 4
13/09/23 12:57:53信息mapred.JobClient:SLOTS_MILLIS_REDUCES = 0
13/09/23 12:57:53信息mapred.JobClient:失败的地图任务= 1


解决方案

其实我使用的是hive-0.9.0。其中有一个错误

  https://issues.apache.org/jira/browse/HIVE-3243。 

因此,在创建HBDeviceHandler表时,SerDe不会忽略逗号和列家族。因此你需要删除空格。那么它会正常工作。


I have created one table hivetest which also create the table in hbase with name of 'hbasetest'. Now I want to copy 'hbasetest' data into another hbase table(say logdata) with the same schema. So, can anyone help me how do copy the data from 'hbasetest' to 'logdata' without using the hive.

CREATE TABLE hivetest(cookie string, timespent string, pageviews string, visit string, logdate string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "m:timespent, m:pageviews, m:visit, m:logdate")
TBLPROPERTIES ("hbase.table.name" = "hbasetest");

Updated question :

I have created the table logdata like this. But, I am getting the following error.

create 'logdata', {NAME => ' m', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS =>'0', TTL => '2147483647', BLOCKSIZE=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

13/09/23 12:57:19 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_0, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family  m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
    at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
    at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020, 
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
    at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
    at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:29 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_1, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family  m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
    at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
    at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020, 
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
    at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
    at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:38 INFO mapred.JobClient: Task Id : attempt_201309231115_0025_m_000000_2, Status : FAILED
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 755 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family  m does not exist in region logdata,,1379920697845.30fce8bcc99bf9ed321720496a3ec498. in table 'logdata', {NAME => 'm', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', ENCODE_ON_DISK => 'true', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
    at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3773)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
    at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
: 755 times, servers with issues: master:60020, 
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1674)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1450)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:916)
    at org.apache.hadoop.hbase.client.HTable.close(HTable.java:953)
    at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.close(TableOutputFormat.java:109)
    at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:651)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/09/23 12:57:53 INFO mapred.JobClient: Job complete: job_201309231115_0025
13/09/23 12:57:53 INFO mapred.JobClient: Counters: 7
13/09/23 12:57:53 INFO mapred.JobClient:   Job Counters 
13/09/23 12:57:53 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=34605
13/09/23 12:57:53 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/09/23 12:57:53 INFO mapred.JobClient:     Rack-local map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient:     Launched map tasks=4
13/09/23 12:57:53 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
13/09/23 12:57:53 INFO mapred.JobClient:     Failed map tasks=1

解决方案

Actually i am using hive-0.9.0. Which has a bug

https://issues.apache.org/jira/browse/HIVE-3243.

So, while creating the table SerDe of HBaseStorageHandler doesn't ignore white space between comma and column family. Hence you need to remove the white spaces. Then it will work fine.

这篇关于将数据从一个hbase表复制到另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆