在运行时部署流处理拓扑? [英] Deploy stream processing topology on runtime?

查看:59
本文介绍了在运行时部署流处理拓扑?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部,

我有一个需要重新记录一些较旧数据的要求.我们有一个多阶段的管道,其来源是一个Kafka主题.一旦记录被输入,它就将经历一系列步骤(大约10个).每个步骤都会对推送到源主题的原始JSON对象进行按摩,并推送到目标主题.

I have a requirement where in I need to re-ingest some of my older data. We have a multi staged pipeline , the source of which is a Kafka topic. Once a record is fed into that, it runs through a series of steps(about 10). Each step massages the original JSON object pushed to the source topic and pushes to a destination topic.

现在,有时候,我们需要重新摄取较旧的数据,并应用我上面描述的步骤的子集.我们打算将这些记录最多的记录推送到另一个主题,以免阻止正在通过的实时"数据,这可能意味着我可能只需要应用上述10个步骤中的1个步骤即可.从上方在整个管道中运行它非常浪费,因为每个步骤都占用大量资源,并需要多个外部服务.另外,我可能需要一次重新记录数百万个条目,因此可能会阻塞我的外部服务.最后,这些重新安排活动并不那么频繁,有时可能每周仅发生一次.

Now, sometimes, we need to re ingest the older data and apply a subset of the steps I described above. We intend to push these re-ingest records to a different topic, so as not to block the "live" data that's coming through, This might mean I may need to apply just 1 step from the 10 I described above. Running it through the entire pipeline from above is wasteful, as each step is pretty resource intensive and calls multiple external services. Also, I might need to re-ingest millions of entries at time, so I might choke my external services. Lastly, these re ingsetion activities are not that frequent and may happen just once a week at times.

让我们说一下是否能够找出需要执行的步骤.这可以通过基本规则引擎来完成.完成此操作后,我需要能够动态创建拓扑/能够部署拓扑,并从新创建的主题开始进行处理.同样,我想在运行时上进行部署的原因是,尽管这些活动对业务至关重要,但它们并不会那么频繁地发生.而且每次,我需要执行的步骤可能都会更改,因此我们不能总是让整个管道都在运行.

Let's say if I am able to figure out what steps from above I need to execute. That can be done via a basic rules engine. Once that has been done, then I need to be able to dynamically create a topology/ be able to deploy it which starts processing from the newly created topic. Again, the reason I want to deploy on runtime is that these activities, albeit business critical, don't happen that frequently. And every time, the steps I need to execute might change, so we can't always have the entire pipeline running.

有没有办法做到这一点?还是我什至在朝着正确的方向思考,即我上面概述的方法是否正确?任何指针都会有所帮助.

Is there a way to achieve this? Or may be am I even thinking in the right direction i.e is the approach I outlined above is even correct? Any pointers would be helpful.

推荐答案

我建议您为重新摄取的数据创建这些动态拓扑,作为单独的Kafka Streams应用程序.而且,如果您想以编程方式即时创建此类应用程序并在完成后将其终止,请考虑以下方法:

I'd suggest you creating those dynamic topologies for re-ingested data as a separate Kafka Streams application. And if you want to programmatically create such applications on-the-fly and terminate them when done consider the following ways:

  1. 使每个步骤都是可配置的:您可能会传入一个旋钮参数列表,并根据它们来动态创建最复杂的拓扑.
  2. 如果要自动触发此类重新摄入管道,请考虑使用一些集成的部署工具来调用KafkaStreams#start.

这篇关于在运行时部署流处理拓扑?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆