有效的方法来删除HBase中的多行 [英] Efficient way to delete multiple rows in HBase

查看:153
本文介绍了有效的方法来删除HBase中的多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种有效的方法来删除HBase中的多行,或者我的用例的味道不适合HBase?



有一个表格,其中包含图表中的项目。行键的格式如下:
chart | date_reversed | ranking_attribute_value_reversed | content_id



有时我想要为给定日期重新生成图表,所以我想删除从'chart | date_reversed_1'开始到'chart | date_reversed_2'的所有行。有没有更好的办法比对扫描发现的每一行发出删除?所有要删除的行将彼此接近。



我需要删除行,因为我不想要一个项目(一个content_id)如果它的sorted_attribute_value被改变了,它会有多个条目(它的改变是图表需要被重新生成的原因)。

作为一个HBase初学者,也许我可能会误用行来表示列会更好 - 如果您有设计建议,很酷!或者,也许图表更好地生成在一个文件(例如没有HBase输出)?我使用MapReduce。

解决方案

首先,即将到达范围删除点,HBase,AFAIK中还没有范围删除。但是在 HTableInterface API 。为此,只需使用扫描中的行键形成一个Delete对象,并将它们放入一个List中并使用该API即可完成!为了使扫描速度更快,扫描结果中不包含任何列族,因为所有你需要的是删除整行的行键。

其次,关于设计。首先,我对这个要求的理解是,有内容id的内容,每个内容都有对它们生成的图表,并且这些数据被存储;每个内容可以有多个图表通过日期并取决于排名。另外我们希望最后生成的内容的图表显示在表格的顶部。

对于我所要求的假设,我建议使用三个表 - auto_id,content_charts和generated_order。 content_charts的行键将是其内容ID,而generated_order的行键将是一个long,这将自动递减使用 HTableInterface API 。用于递减使用'-1'作为偏移量,并在应用程序首次启动时手动初始化auto_id表中的Long.MAX_VALUE值。所以现在如果你想删除图表数据,只需使用 delete ,然后放回新数据,然后放入generated_order表。这样最新的插入也将在最新的插入表格的顶部,它将内容ID保存为一个单元值。如果要确保generated_order每个内容只有一个条目,请首先保存generated_order标识,然后在放置和删除列族之前将其取值并保存到content_charts中,首先从generated_order中删除该行。通过这种方式,您可以使用2个获取内容的查找和图表获取最大值,并且不需要扫描图表。



我希望这有帮助。


Is there an efficient way to delete multiple rows in HBase or does my use case smell like not suitable for HBase?

There is a table say 'chart', which contains items that are in charts. Row keys are in the following format: chart|date_reversed|ranked_attribute_value_reversed|content_id

Sometimes I want to regenerate chart for a given date, so I want to delete all rows starting from 'chart|date_reversed_1' till 'chart|date_reversed_2'. Is there a better way than to issue a Delete for each row found by a Scan? All the rows to be deleted are going to be close to each other.

I need to delete the rows, because I don't want one item (one content_id) to have multiple entries which it will have if its ranked_attribute_value had been changed (its change is the reason why chart needs to be regenerated).

Being a HBase beginner, so perhaps I might be misusing rows for something that columns would be better -- if you have a design suggestions, cool! Or, maybe the charts are better generated in a file (e.g. no HBase for output)? I'm using MapReduce.

解决方案

Firstly, coming to the point of range delete there is no range delete yet in HBase, AFAIK. But there is a way to delete more than one rows at a time in the HTableInterface API. For this simply form a Delete object with row keys from scan and put them in a List and use the API, done! To make scan faster do not include any column family in the scan result as all you need is the row key for deleting whole rows.

Secondly, about the design. First my understanding of the requirement is, there are contents with content id and each content has charts generated against them and those data are stored; there can be multiple charts per content via dates and depends on the rank. In addition we want the last generated content's chart to show at the top of the table.

For my assumption of the requirement I would suggest using three tables - auto_id, content_charts and generated_order. The row key for content_charts would be its content id and the row key for generated_order would be a long, which would auto-decremented using HTableInterface API. For decrementing use '-1' as the amount to offset and initialize the value Long.MAX_VALUE in the auto_id table at the first start up of the app or manually. So now if you want to delete the chart data simply clean the column family using delete and then put back the new data and then make put in the generated_order table. This way the latest insertion will also be at the top in the latest insertion table which will hold the content id as a cell value. If you want to ensure generated_order has only one entry per content save the generated_order id first and take the value and save it into content_charts when putting and before deleting the column family first delete the row from generated_order. This way you could lookup and charts for a content using 2 gets at max and no scan required for the charts.

I hope this is helpful.

这篇关于有效的方法来删除HBase中的多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆