如何根据Hbase中的rowkey删除所有列的最新版本 [英] How to Delete recent version of all columns based on rowkey in Hbase

查看:1748
本文介绍了如何根据Hbase中的rowkey删除所有列的最新版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要求从Hbase删除数据。我想根据Hbase中的行键删除每个单元格的最新版本。
我想到了一种方法来获得每列的列名和最新时间​​戳与给定的rowkey .....然后迭代地执行删除操作,每列和它的时间戳。

但是我无法获取列名,所以我无法做到这一点。



如果您有任何问题,请分享想法或工作代码?

解决方案

From 版本0.94的HBase官方指南,你可以看到:


删除工作通过创建墓碑标记。例如,让我们假设
我们想要删除一行。为此,您可以指定一个版本,或者默认情况下
使用currentTimeMillis。这意味着删除
版本小于或等于此版本的所有单元格。
HBase从不修改数据,例如删除不会
立即删除(或标记为已删除)存储
文件中对应于删除条件的条目。而是写一个所谓的
墓碑,它将掩盖删除的值[17]。如果您在删除某行时指定的
版本大于该行中任何值的版本
,那么您可以将完整的行删除为


所以我没有看到遵循标准删除程序的问题。



但是,如果您只想删除最新版本的单元格,您可以使用Scan类的 setTimestamp 方法。所以,你可以做的是:

  List< Delete> deletes = new ArrayList<>(); 
扫描扫描=新扫描();
scan.setTimestamp(latestVersionTimeStamp); // latestVersionTimeStamp是一个长变量
//在这里设置你的过滤器
ResultScanner rscanner = table.getScanner(scan);
for(Result rs:rscanner){
deletes.add(new Delete(rs.getRow()));
}
尝试{
table.delete(deletes);
}
catch(Exception e){
e.printStackTrace();
}

但是,如果您的时间戳在单元格中不一样,不适合所有人。这可能会。

 列表<删除> deletes = new ArrayList<>(); 
ArrayList< long> timestamps = new ArrayList<>(); //您的时间戳列表
Delete d;
扫描扫描=新扫描();
//在这里设置你的过滤器
ResultScanner rscanner = table.getScanner(scan);
for(Pair< Result,long> item:zip(rscanner,timestamps)){
d = new Delete(item.getLeft()。getRow())
d.setTimestamp(item。 GetRight时());
deletes.add(d);
}
尝试{
table.delete(deletes);
}
catch(Exception e){
e.printStackTrace();
}

但是,我不保证这会起作用。官方指南很模糊,我可能会误解任何东西。如果我的确误解了,提醒我,我会删除这个答案。



我的信息来自何处
扫描类的setTimestamp方法
删除类的setTimestamp方法


I've a requirement with deleting the data from Hbase. I want to delete the latest version of each cell based on the row key in Hbase. I thought of an approach to get the column names and latest timestamp of each column with the given rowkey.....then perform the delete operation iteratively with each column and its time stamp.

But I'm not able to get the column names, so I'm not able do it.

Please share if you have any thoughts or working code ?

解决方案

From HBase official guide for version 0.94, you can see that:

Deletes work by creating tombstone markers. For example, let's suppose we want to delete a row. For this you can specify a version, or else by default the currentTimeMillis is used. What this means is "delete all cells where the version is less than or equal to this version". HBase never modifies data in place, so for example a delete will not immediately delete (or mark as deleted) the entries in the storage file that correspond to the delete condition. Rather, a so-called tombstone is written, which will mask the deleted values[17]. If the version you specified when deleting a row is larger than the version of any value in the row, then you can consider the complete row to be deleted.

So I don't see the problem with following the standard Delete procedure.

However, if you want to delete only the latest versions of your cells you could use the setTimestamp method of Scan class. So, what you could do is:

List<Delete> deletes = new ArrayList<>();
Scan scan = new Scan();
scan.setTimestamp(latestVersionTimeStamp); //latestVersionTimeStamp is a long variable
//set your filters here
ResultScanner rscanner = table.getScanner(scan);
for(Result rs : rscanner){
    deletes.add(new Delete(rs.getRow()));
}
try{
    table.delete(deletes);
}
catch(Exception e){
    e.printStackTrace();
}

However, if your Time Stamp isn't the same across cells, this will not work for all of them. This probably will.

List<Delete> deletes = new ArrayList<>();
ArrayList<long> timestamps =  new ArrayList<>();//your list of timestamps
Delete d;
Scan scan = new Scan();
//set your filters here
ResultScanner rscanner = table.getScanner(scan);
for(Pair<Result, long> item : zip(rscanner, timestamps)){
    d=new Delete(item.getLeft().getRow())
    d.setTimestamp(item.getRight());
    deletes.add(d);
}
try{
    table.delete(deletes);
}
catch(Exception e){
    e.printStackTrace();
}

I don't guarantee this will work, however. The official guides are vague enough and I might have misinterpreted anything. If I did indeed misinterpret, alert me and I will delete this answer.

WHERE I SOURCED MY INFORMATION setTimestamp method for Scan class and setTimestamp method for Delete class

这篇关于如何根据Hbase中的rowkey删除所有列的最新版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆