在 HBase 中删除多行的有效方法 [英] Efficient way to delete multiple rows in HBase

查看:69
本文介绍了在 HBase 中删除多行的有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种有效的方法可以删除 HBase 中的多行,或者我的用例闻起来不适合 HBase?

Is there an efficient way to delete multiple rows in HBase or does my use case smell like not suitable for HBase?

有一个表格叫做图表",其中包含图表中的项目.行键采用以下格式:chart|date_reversed|ranked_attribute_value_reversed|content_id

There is a table say 'chart', which contains items that are in charts. Row keys are in the following format: chart|date_reversed|ranked_attribute_value_reversed|content_id

有时我想为给定日期重新生成图表,所以我想删除从chart|date_reversed_1"到chart|date_reversed_2"的所有行.有没有比为扫描找到的每一行发出删除更好的方法?所有要删除的行都将彼此靠近.

Sometimes I want to regenerate chart for a given date, so I want to delete all rows starting from 'chart|date_reversed_1' till 'chart|date_reversed_2'. Is there a better way than to issue a Delete for each row found by a Scan? All the rows to be deleted are going to be close to each other.

我需要删除行,因为我不希望一个项目(一个 content_id)有多个条目,如果它的ranked_attribute_value 已更改(它的更改是需要重新生成图表的原因).

I need to delete the rows, because I don't want one item (one content_id) to have multiple entries which it will have if its ranked_attribute_value had been changed (its change is the reason why chart needs to be regenerated).

作为一个 HBase 初学者,所以我可能会误用行来代替列更好的东西——如果你有设计建议,很酷!或者,也许图表在文件中生成更好(例如,没有 HBase 用于输出)?我正在使用 MapReduce.

Being a HBase beginner, so perhaps I might be misusing rows for something that columns would be better -- if you have a design suggestions, cool! Or, maybe the charts are better generated in a file (e.g. no HBase for output)? I'm using MapReduce.

推荐答案

首先,说到范围删除,在 HBase,AFAIK 中还没有范围删除.但是有一种方法可以在 HTableInterface API.为此,只需使用扫描中的行键形成一个 Delete 对象,并将它们放入 List 并使用 API,完成!为了使扫描更快,请不要在扫描结果中包含任何列族,因为您只需要用于删除整行的行键.

Firstly, coming to the point of range delete there is no range delete yet in HBase, AFAIK. But there is a way to delete more than one rows at a time in the HTableInterface API. For this simply form a Delete object with row keys from scan and put them in a List and use the API, done! To make scan faster do not include any column family in the scan result as all you need is the row key for deleting whole rows.

其次,关于设计.首先我对需求的理解是,有内容ID的内容,每个内容都有针对它们生成的图表并存储这些数据;每个内容可以通过日期有多个图表,具体取决于排名.此外,我们希望最后生成的内容的图表显示在表格的顶部.

Secondly, about the design. First my understanding of the requirement is, there are contents with content id and each content has charts generated against them and those data are stored; there can be multiple charts per content via dates and depends on the rank. In addition we want the last generated content's chart to show at the top of the table.

对于我的需求假设,我建议使用三个表 - auto_id、content_charts 和 generated_order.content_charts 的行键将是它的内容 id,而 generated_order 的行键将是长的,这将使用 HTableInterface API.对于递减,使用-1"作为偏移量,并在应用程序首次启动时或手动初始化 auto_id 表中的 Long.MAX_VALUE 值.所以现在如果你想删除图表数据,只需使用 delete 然后放回新数据,然后放到生成的订单表中.这样,最新插入也将位于最新插入表的顶部,该表将内容 id 作为单元格值.如果您想确保 generate_order 每个内容只有一个条目,请先保存 generate_order id,然后在放置时和删除列族之前将值保存到 content_charts 中,首先从 generate_order 中删除行.这样,您最多可以使用 2 个 get 来查找内容并绘制图表,并且不需要对图表进行扫描.

For my assumption of the requirement I would suggest using three tables - auto_id, content_charts and generated_order. The row key for content_charts would be its content id and the row key for generated_order would be a long, which would auto-decremented using HTableInterface API. For decrementing use '-1' as the amount to offset and initialize the value Long.MAX_VALUE in the auto_id table at the first start up of the app or manually. So now if you want to delete the chart data simply clean the column family using delete and then put back the new data and then make put in the generated_order table. This way the latest insertion will also be at the top in the latest insertion table which will hold the content id as a cell value. If you want to ensure generated_order has only one entry per content save the generated_order id first and take the value and save it into content_charts when putting and before deleting the column family first delete the row from generated_order. This way you could lookup and charts for a content using 2 gets at max and no scan required for the charts.

我希望这会有所帮助.

这篇关于在 HBase 中删除多行的有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆