维护累积行中的统计信息 [英] Maintain statistics across rows in accumulo

查看:92
本文介绍了维护累积行中的统计信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Accumulo来说还比较陌生,因此非常感谢您更好地做到这一点的一般技巧。

I am relatively new to Accumulo, so would greatly appreciate general tips for doing this better.

我有一个由时间部分和地理组成部分。我想在某种迭代器中维护统计信息(计数,总和等),但希望将其他行的变异作为摄取的一部分。换句话说,当我插入一行时:

I have a rowIds that are made up of a time component and a geographic component. I'd like to maintain statistics (counts, sums, etc.) in an iterator of some sort, but would like to emit mutations to other rows as part of the ingest. In other words, as I insert a row:

<timeA>_<geoX> colFam:colQual value

除了上面的变异,我想将统计信息保留在单独的行中在同一表(或不同表)中,如下所示:

In addition to the mutation above, I'd like to maintain stats in separate rows in the same table (or a different one) as follows:

timeA_countRow colFam:colQual count++
geoX_countRow colFam:colQUal count++
timeA_sumRow colFam:colQUal sum += value
geoX_sumRow colFam:colQual sum += value

完成这种事情的最佳方法是什么?我肯定已经看过统计合并器,但是据我了解,它可以在一行中工作。我想根据部分关键字来维护统计信息...

What is the best way to accomplish such a thing? I have definitely seen the stats combiner, but that works within a single row to my understanding. I'd like to maintain stats based on parts of the key...

谢谢!

推荐答案


除了上述突变外,我还希望将统计信息保持在同一表(或不同表)的不同行中,如下所示:

In addition to the mutation above, I'd like to maintain stats in separate rows in the same table (or a different one) as follows

这基本上与Accumulo不兼容。在迭代器的范围内,您不知道有关单独行中的数据的信息。这就是StatsCombiner是在单行上下文中编写的原因。平板电脑(物理数据边界)中不能保证包含任何其他行。

This is something that fundamentally does not work with Accumulo. You cannot know, within the confines of an Iterator, about data in a separate row. That's why the StatsCombiner is written in the context of a single row. Any other row is not guaranteed to be contained in the Tablet (physical data boundary).

一种常见的方法是通过单独的表或位置在客户端维护此信息。 SummingCombiner组。当您为特定列插入更新时,您还会向统计信息表中提交更新。

A common approach is to maintain this information client-side via a separate table or locality group with a SummingCombiner. When you insert an update for a specific column, you also submit an update to your stats table.

您还可以查看 Fluo ,您可以执行跨行交易。这是与普通Accumulo不同的野兽,仍处于测试阶段。

You could also look into Fluo which allows you to perform cross-row transactions. This is a different beast than normal Accumulo and is still in beta.

这篇关于维护累积行中的统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆