如何在Flink中使用多个计数器 [英] How to use multiple counters in Flink

查看:57
本文介绍了如何在Flink中使用多个计数器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(与如何在Flink中创建动态指标有关的种类)

我有一个 events(someid:String,name:String)流,出于监视的原因,我需要一个 per 事件ID计数器.在所有Flink文档和示例中,我都可以看到,例如,该计数器是用map函数的 open 中的名称初始化的.

I have a stream of events(someid:String, name:String) and for monitoring reasons, I need a counter per event ID. In all the Flink documentations and examples, I can see that the counter is , for instance, initialised with a name in the open of a map function.

但是在我的情况下,我无法初始化计数器,因为每个eventId需要一个计数器,并且我不预先知道该值.而且,我知道每当一个偶数传入MapFunction的 map()方法中时,创建一个新的计数器将是多么昂贵.最后,我不能保留计数器的缓存",因为它太大了.

But in my case I can not initialise the counter as I will need one per eventId and I do not know the value in advance. Also, I understand how expensive it would be to create a new counter every time an even passes in the map() method of the MapFunction. Finally, I can not keep a "cache" of counters as it would be too big.

理想情况下,我想要这样的东西:

Ideally, I would like something like this :

class Event(id: String, name: String)

class ExampleMapFunction extends RichMapFunction[Event, Event] {
  @transient private var counter: Counter = _

  override def open(parameters: Configuration): Unit = {
    counter = new Counter()
  }

  override def map(event: Event): Event = {
    counter.inc(event.id)
    event
  }
}

或者基本上可以实现自己的计数器来传递尺寸吗?如果是,怎么办?

Or basically could I implement my own counter that allow me to pass a dimension? if yes, how?

对于这种用例有任何建议或最佳做法吗?

Any advise or best practice for this kind of use-case?

推荐答案

如果保留计数器的缓存太大,那么我认为使用指标不会满足您的需求.

If keeping a cache of the counters would be too big, then I don't think using metrics is going to scale in a way that will satisfy your requirements.

一些替代方法:

  • 使用侧面输出在一些外部的,可查询/可视化的数据存储区(例如influxdb)中收集有意义的事件.

  • Use side outputs to collect meaningful events in some external, queryable/visualizable data store -- e.g., influxdb.

将信息保持为键控状态,并根据需要使用广播消息触发信息的相关部分的输出(再次使用侧面输出).

Hold the info in keyed state, and use broadcast messages to trigger output of relevant portions of it as desired (again using side outputs).

将信息保持为键控状态,并获取定期保存点,然后使用状态处理器API通过查询进行分析.

Hold the info in keyed state, and take periodic savepoints, which you then analyze via queries using the state processor API.

这篇关于如何在Flink中使用多个计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆