Java中的Hbase CopyTable [英] Hbase CopyTable inside Java

查看:243
本文介绍了Java中的Hbase CopyTable的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我想重复使用来自 Hbase-server github页面



我一直在寻找来自hbase的文档,但它对我没有多大帮助 http://hbase.apache.org/apidocs/org/apache/hadoop/ hbase / mapreduce / CopyTable.html



看了这篇文章之后,我发现:可以在java的另一个类中调用类的main()方法



我想我可以直接使用它的主类来调用它。



问题:您是否认为无论如何都要完成此副本,而不是使用hbase-server的CopyTable?您是否看到使用此CopyTable造成的不便?

解决方案


这个副本完成而不是
使用hbase-server的CopyTable?您是否看到使用
这张CopyTable带来的不便?


首先,快照比 CopyTable 更好。




  • HBase快照允许您在不影响Region Server的情况下拍摄表的快照。快照,克隆和恢复操作不涉及数据复制。另外,将快照导出到另一个群集不会影响Region Server。



在版本0.94.6之前,唯一的备份或克隆表的方法是使用CopyTable / ExportTable,或者在禁用表后复制HDFS中的所有hfiles。这些方法的缺点是可能会降低区域服务器的性能(复制/导出表),或者您需要禁用表,这意味着无法读取或写入;这通常是不可接受的。





另请参阅快照+和+可重复+读+用于+ HBase +表格

快照内幕






另一张地图缩小比 CopyTable



在你的代码中有如下所示,这是针对独立程序的,因为你已经编写了mapreduce作业来将多个put记录作为批量插入(可能为100000)。

对于hbase客户端的独立插入,您可以使用mapreduce方式尝试此操作。

  public void addMultipleRecordsAtaShot(final ArrayList< Put> puts,final String tableName)throws Exception {
try {
final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(),getTable(tableName));
table.put(puts);
LOG.info(INSERT record [s]+ puts.size()+to table+ tableName +OK。);
} catch(final Throwable e){
e.printStackTrace();
} finally {
LOG.info(Processed --->+ puts.size());
if(puts!= null){
puts.clear();
}
}
}

以及您也可以考虑以下...



启用写入缓冲区大于默认值



1) table.setAutoFlush(false)



<2>设置缓冲区大小

 <性> 
< name> hbase.client.write.buffer< / name>
<值> 20971520< /值> //您可以将其加倍以获得更好的效果2 x 20971520 = 41943040
< / property>


void setWriteBufferSize(long writeBufferSize)throws IOException

缓冲区只有两次被刷新:

显式刷新

使用 flushCommits( )调用将数据发送到服务器进行永久存储。



隐式刷新

当您调用 put() setWriteBufferSize()时触发。
这两个调用将当前使用的缓冲区大小与配置的限制进行比较,并可以选择调用 flushCommits()方法。

$ b 如果整个缓冲区被禁用,设置 setAutoFlush(true)将强制客户端调用flush方法每次调用 put()


I want to copy one Hbase table to another location with good performance.

I would like to reuse the code from CopyTable.java from Hbase-server github page

I've been looking the doccumentation from hbase but it didn't help me much http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CopyTable.html

After looking in this post of stackoverflow: Can a main() method of class be invoked in another class in java

I think I can directly call it using its main class.

Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?

解决方案

Question: Do you think anyway better to get this copy done rather than using CopyTable from hbase-server ? Do you see any inconvenience using this CopyTable ?

First thing is snapshot is better way than CopyTable.

  • HBase Snapshots allow you to take a snapshot of a table without too much impact on Region Servers. Snapshot, Clone and restore operations don't involve data copying. Also, Exporting the snapshot to another cluster doesn't have impact on the Region Servers.

Prior to version 0.94.6, the only way to backup or to clone a table is to use CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the table. The disadvantages of these methods are that you can degrade region server performance (Copy/Export Table) or you need to disable the table, that means no reads or writes; and this is usually unacceptable.

Also, see Snapshots+and+Repeatable+reads+for+HBase+Tables

Snapshot Internals


Another Map reduce way than CopyTable :

You can implement something like below in your code this is for standalone program where as you have write mapreduce job to insert multiple put records as a batch (may be 100000).

This increased performance for standalone inserts in to hbase client you can try this in mapreduce way

public void addMultipleRecordsAtaShot(final ArrayList<Put> puts, final String tableName) throws Exception {
        try {
            final HTable table = new HTable(HBaseConnection.getHBaseConfiguration(), getTable(tableName));
            table.put(puts);
            LOG.info("INSERT record[s] " + puts.size() + " to table " + tableName + " OK.");
        } catch (final Throwable e) {
            e.printStackTrace();
        } finally {
            LOG.info("Processed ---> " + puts.size());
            if (puts != null) {
                puts.clear();
            }
        }
    }

along with that you can also consider below...

Enable write buffer to large value than default

1) table.setAutoFlush(false)

2) Set buffer size

<property>
         <name>hbase.client.write.buffer</name>
         <value>20971520</value> // you can double this for better performance 2 x 20971520 = 41943040
 </property>
             OR

    void setWriteBufferSize(long writeBufferSize) throws IOException

The buffer is only ever flushed on two occasions:
Explicit flush
Use the flushCommits() call to send the data to the servers for permanent storage.

Implicit flush
This is triggered when you call put() or setWriteBufferSize(). Both calls compare the currently used buffer size with the configured limit and optionally invoke the flushCommits() method.

In case the entire buffer is disabled, setting setAutoFlush(true) will force the client to call the flush method for every invocation of put().

这篇关于Java中的Hbase CopyTable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆