Kafka HDFS连接器-无完全融合 [英] Kafka HDFS Connector - Without Full Confluent

查看:102
本文介绍了Kafka HDFS连接器-无完全融合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正在运行的Kafka 0.10实例,我目前正在使用Gobblin将数据存储到HDFS中.我想切换到Kafka Connect,经过研究发现,Confluent提供了一个连接器.

但是,有没有一种方法可以在不使用整个Confluent平台的情况下使用此连接器?例如,这意味着我可以从Confluent来源复制相关脚本,并以某种方式使我的Kafka实例使用它吗? 我基本上仍然在学习这些东西,所以我对这个领域还不是很熟悉.

谢谢.

解决方案

是可以的.我已经做到了.我使用在Docker容器中运行的经过稍微修改的Confluent HDFS独立连接器. 但是,您也必须使用SchemaRegistry.因为连接器与SchemaRegistry紧密耦合. 同样,您将必须以特殊格式发送消息.为了支持自动模式识别,Confluent Kafka使用者引入了一种内部消息格式.因此,为了与融合的消费者兼容,您的生产者必须根据以下格式编写消息.

  • 标题(5个字节)
    • 消息魔术字节"的第一个字节应始终为0
    • 接下来的4个字节应为以大字节序格式编码的架构注册表中的架构编号.
  • 有效负载(Avro \ Parquet对象,二进制编码).

PS在发送消息到主题时要非常小心,因为如果消息不匹配架构,或者注册表中不存在带有Id的架构,使用者无声地失败:工作线程停止,但应用程序仍挂在内存中并且不退出./p>

I have a running instance of Kafka 0.10 and I'm currently using Gobblin to store data into HDFS. I want to switch to Kafka Connect, and as I researched I found that Confluent provide a connector.

However, is there a way to use this connector without using the entire Confluent platform? Meaning can I for example copy the relevant scripts from Confluent source and somehow make my Kafka instance use it? I'm basically still learning my way through this stuff so I'm not yet very well versed in this space.

Thanks.

解决方案

Yes it is possible. I've done that. I use slightly modified Confluent HDFS standalone connector that runs in Docker container. However, you will have to use SchemaRegistry too. Because connectors are tightly coupled to SchemaRegistry. Also, you will have to send messages with special format. To support automatic schema recognition Confluent Kafka consumers introduce an internal format of messages. Therefore, to be compatible with confluent consumers, your producers must compose messages according to the following format.

  • Header (5 bytes)
    • The first byte of the message "Magic byte" should be always 0
    • The next 4 bytes should be Id of schema in schema registry encoded in Big Endian format.
  • Payload (Avro\Parquet object, binary encoded).

PS Be very careful with sending messages to topic becuase if message does not match schema, or a schema with Id does not exist in registry, consumer silently fails: worker thread stops but applications still hangs in memory and does not exit.

这篇关于Kafka HDFS连接器-无完全融合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆