在运行时部署流处理拓扑? [英] Deploy stream processing topology on runtime?

查看：27 发布时间：2021/11/14 23:40:25 apache-kafka apache-storm apache-kafka-streams flink-streaming

本文介绍了在运行时部署流处理拓扑?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

大家好，

我有一个要求，我需要重新提取一些旧数据.我们有一个多阶段管道，其来源是一个 Kafka 主题.一旦将记录输入其中，它就会运行一系列步骤(大约 10 个).每一步都会对推送到源主题的原始 JSON 对象进行按摩，然后推送到目标主题.

I have a requirement where in I need to re-ingest some of my older data. We have a multi staged pipeline , the source of which is a Kafka topic. Once a record is fed into that, it runs through a series of steps(about 10). Each step massages the original JSON object pushed to the source topic and pushes to a destination topic.

现在，有时，我们需要重新摄取旧数据并应用我上面描述的步骤的一个子集.我们打算将这些重新摄取记录推送到不同的主题，以免阻止通过的实时"数据，这可能意味着我可能只需要应用上面描述的 10 个步骤中的 1 个步骤.从上面通过整个管道运行它是浪费的，因为每个步骤都非常占用资源并调用多个外部服务.此外，我可能需要一次重新摄取数百万个条目，因此我可能会阻塞我的外部服务.最后，这些重新摄入活动并不那么频繁，有时可能每周只发生一次.

Now, sometimes, we need to re ingest the older data and apply a subset of the steps I described above. We intend to push these re-ingest records to a different topic, so as not to block the "live" data that's coming through, This might mean I may need to apply just 1 step from the 10 I described above. Running it through the entire pipeline from above is wasteful, as each step is pretty resource intensive and calls multiple external services. Also, I might need to re-ingest millions of entries at time, so I might choke my external services. Lastly, these re ingsetion activities are not that frequent and may happen just once a week at times.

假设我是否能够从上面找出我需要执行的步骤.这可以通过基本规则引擎来完成.完成后，我需要能够动态创建拓扑/能够部署它，从新创建的主题开始处理.同样，我想在运行时部署的原因是这些活动虽然是关键业务，但不会经常发生.而且每次我需要执行的步骤都可能会发生变化，所以我们不可能总是让整个管道都在运行.

Let's say if I am able to figure out what steps from above I need to execute. That can be done via a basic rules engine. Once that has been done, then I need to be able to dynamically create a topology/ be able to deploy it which starts processing from the newly created topic. Again, the reason I want to deploy on runtime is that these activities, albeit business critical, don't happen that frequently. And every time, the steps I need to execute might change, so we can't always have the entire pipeline running.

有没有办法做到这一点?或者我什至在思考正确的方向，即我上面概述的方法甚至是正确的?任何指针都会有所帮助.

Is there a way to achieve this? Or may be am I even thinking in the right direction i.e is the approach I outlined above is even correct? Any pointers would be helpful.

在运行时部署流处理拓扑? [英] Deploy stream processing topology on runtime?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在运行时部署流处理拓扑? [英] Deploy stream processing topology on runtime?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭