在运行时部署流处理拓扑? [英] Deploy stream processing topology on runtime?

查看：59 发布时间：2020/9/3 19:20:53 apache-kafka apache-storm apache-kafka-streams flink-streaming

本文介绍了在运行时部署流处理拓扑?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

全部，

我有一个需要重新记录一些较旧数据的要求.我们有一个多阶段的管道，其来源是一个Kafka主题.一旦记录被输入，它就将经历一系列步骤(大约10个).每个步骤都会对推送到源主题的原始JSON对象进行按摩，并推送到目标主题.

I have a requirement where in I need to re-ingest some of my older data. We have a multi staged pipeline , the source of which is a Kafka topic. Once a record is fed into that, it runs through a series of steps(about 10). Each step massages the original JSON object pushed to the source topic and pushes to a destination topic.

现在，有时候，我们需要重新摄取较旧的数据，并应用我上面描述的步骤的子集.我们打算将这些记录最多的记录推送到另一个主题，以免阻止正在通过的实时"数据，这可能意味着我可能只需要应用上述10个步骤中的1个步骤即可.从上方在整个管道中运行它非常浪费，因为每个步骤都占用大量资源，并需要多个外部服务.另外，我可能需要一次重新记录数百万个条目，因此可能会阻塞我的外部服务.最后，这些重新安排活动并不那么频繁，有时可能每周仅发生一次.

Now, sometimes, we need to re ingest the older data and apply a subset of the steps I described above. We intend to push these re-ingest records to a different topic, so as not to block the "live" data that's coming through, This might mean I may need to apply just 1 step from the 10 I described above. Running it through the entire pipeline from above is wasteful, as each step is pretty resource intensive and calls multiple external services. Also, I might need to re-ingest millions of entries at time, so I might choke my external services. Lastly, these re ingsetion activities are not that frequent and may happen just once a week at times.

让我们说一下是否能够找出需要执行的步骤.这可以通过基本规则引擎来完成.完成此操作后，我需要能够动态创建拓扑/能够部署拓扑，并从新创建的主题开始进行处理.同样，我想在运行时上进行部署的原因是，尽管这些活动对业务至关重要，但它们并不会那么频繁地发生.而且每次，我需要执行的步骤可能都会更改，因此我们不能总是让整个管道都在运行.

Let's say if I am able to figure out what steps from above I need to execute. That can be done via a basic rules engine. Once that has been done, then I need to be able to dynamically create a topology/ be able to deploy it which starts processing from the newly created topic. Again, the reason I want to deploy on runtime is that these activities, albeit business critical, don't happen that frequently. And every time, the steps I need to execute might change, so we can't always have the entire pipeline running.

有没有办法做到这一点?还是我什至在朝着正确的方向思考，即我上面概述的方法是否正确?任何指针都会有所帮助.

Is there a way to achieve this? Or may be am I even thinking in the right direction i.e is the approach I outlined above is even correct? Any pointers would be helpful.

在运行时部署流处理拓扑? [英] Deploy stream processing topology on runtime?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在运行时部署流处理拓扑? [英] Deploy stream processing topology on runtime?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭