运行时的 Apache Flink 映射 [英] Apache Flink Mapping at Runtime

查看:23
本文介绍了运行时的 Apache Flink 映射的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经构建了一个 flink 流作业来从 kafka 读取一个 xml 文件,转换该文件并将其写入数据库中.由于 xml 文件中的属性与数据库列名称不匹配,因此我为映射构建了一个 switch case.

i have build a flink streaming job to read a xml file from kafka convert the file and write it in a database. As the attributes in the xml file don't match the database column names i have build a switch case for the mapping.

由于这不是很灵活,我想从代码中去掉这个硬连线的映射信息.首先,我想出了一个映射文件的想法,它可能如下所示:

As this is not really flexible i want to take this hardwired mapping information out of the code. First of all i came up with the idea of a mapping file which could look like this:

path.in.xml.to.attribut=database.column.name

当前的作业逻辑如下:

switch(path.in.xml.to.attribute){
    case "example.one.name":
        return "name";

对于映射文件,我想我会使用 Map 将映射数据存储为键值对.

With the mapping file i guess i would work with an Map to store the mapping data as a Key-Value-Pair.

这将使工作更加灵活,就像现在一样.仍然有一个缺点是,对于我想要应用的此配置中的每个更改,我都必须重新启动 flink 作业.

This would make the job more flexible as it is right now. Still a downside would be that for every change in this configuration i want to apply i would have to restart the flink job.

我的问题是是否可以在运行时注入这种映射逻辑,例如通过自己的 kafka 主题.当这种实现成为可能时,它会如何作为示例.

My question is if it is possible to inject this kind of mapping logic at runtime, for example via an own kafka topic. And when this kind of implementation is possible how could it look like as an example.

推荐答案

如果您唯一需要的是能够更新 xml 属性和数据库列名之间的映射,那么 可以使用广播状态模式.另外,Apache Flink 中的广播状态实用指南是也有用.

If the only you need is to be able to update the mapping between the xml attributes and database column names, then the The Broadcast State Pattern can be used. Also, A Practical Guide to Broadcast State in Apache Flink is usefull as well.

这个想法是有一个流,订阅你自己的 kafka 主题,并带有数据库映射,将更新广播到所有任务管理器.这些操作符将维护这个 Map 作为一个状态,你可以使用这个映射状态来解析列名,即代替 switch(path.in.xml.to.属性) 使用 map.get(path.in.xml.to.attribute)).在这种情况下,map 运算符应替换为 BroadcastProcessFunction.

The idea is to have a stream, subscribed to your own kafka topic with database mappings which broadcasts the updates to all task managers. These operators will maintain this Map<String, String> as a state and you can use this mapping state to resolve the column name, i.e. instead of switch(path.in.xml.to.attribute) use map.get(path.in.xml.to.attribute)). The map operator in this case should be replaced with BroadcastProcessFunction.

这篇关于运行时的 Apache Flink 映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆