运行时的Apache Flink映射 [英] Apache Flink Mapping at Runtime

查看:82
本文介绍了运行时的Apache Flink映射的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经建立了一个flink流作业,以从kafka读取xml文件并将其转换并写入数据库.由于xml文件中的属性与数据库列名称不匹配,因此我为映射建立了一个转换案例.

i have build a flink streaming job to read a xml file from kafka convert the file and write it in a database. As the attributes in the xml file don't match the database column names i have build a switch case for the mapping.

因为这不是很灵活,所以我想从代码中删除这些有线映射信息.首先,我想到了一个映射文件的想法,看起来像这样:

As this is not really flexible i want to take this hardwired mapping information out of the code. First of all i came up with the idea of a mapping file which could look like this:

path.in.xml.to.attribut=database.column.name

当前的作业逻辑如下:

switch(path.in.xml.to.attribute){
    case "example.one.name":
        return "name";

我想使用映射文件时,我会使用Map来将映射数据存储为键值对.

With the mapping file i guess i would work with an Map to store the mapping data as a Key-Value-Pair.

这将使工作变得更加灵活.缺点还在于,对于此配置中的所有更改,我要应用,都必须重新启动flink作业.

This would make the job more flexible as it is right now. Still a downside would be that for every change in this configuration i want to apply i would have to restart the flink job.

我的问题是,是否有可能在运行时注入这种映射逻辑,例如通过自己的kafka主题.当这种实现成为可能时,它看起来像一个例子.

My question is if it is possible to inject this kind of mapping logic at runtime, for example via an own kafka topic. And when this kind of implementation is possible how could it look like as an example.

推荐答案

如果仅需要更新xml属性和数据库列名称之间的映射,则 Apache Flink中的广播状态实用指南是也是有用的.

If the only you need is to be able to update the mapping between the xml attributes and database column names, then the The Broadcast State Pattern can be used. Also, A Practical Guide to Broadcast State in Apache Flink is usefull as well.

这个想法是让一个流通过数据库映射订阅您自己的kafka主题,该映射将更新广播到所有任务管理器.这些运算符将将此 Map< String,String> 保持为状态,并且您可以使用此映射状态来解析列名,即代替 switch(path.in.xml.to.属性)使用 map.get(path.in.xml.to.attribute)).在这种情况下, map 运算符应替换为 BroadcastProcessFunction .

The idea is to have a stream, subscribed to your own kafka topic with database mappings which broadcasts the updates to all task managers. These operators will maintain this Map<String, String> as a state and you can use this mapping state to resolve the column name, i.e. instead of switch(path.in.xml.to.attribute) use map.get(path.in.xml.to.attribute)). The map operator in this case should be replaced with BroadcastProcessFunction.

这篇关于运行时的Apache Flink映射的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆