Kafka连接设置以使用AWS MSK从Aurora发送记录 [英] Kafka connect setup to send record from Aurora using AWS MSK

查看:307
本文介绍了Kafka连接设置以使用AWS MSK从Aurora发送记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须将记录从Aurora/Mysql发送到MSK,再从那里发送到Elastic搜索服务

Aurora-> Kafka-connect ---> AWS MSK ---> kafka connect --->弹性搜索

Aurora表结构中的记录是这样的
我认为记录将以这种格式发送到AWS MSK.

"o36347-5d17-136a-9749-Oe46464",0,"NEW_CASE","WRLDCHK","o36347-5d17-136a-9749-Oe46464","<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?><caseCreatedPayload><batchDetails/>","CASE",08-JUL-17 10.02.32.217000000 PM,"TIME","UTC","ON","0a348753-5d1e-17a2-9749-3345,MN4,","","0a348753-5d1e-17af-9749-FGFDGDFV","EOUHEORHOE","2454-5d17-138e-9749-setwr23424","","","",,"","",""

因此,为了通过弹性搜索进行消费,我需要使用适当的架构,因此必须使用架构注册表.

我的问题

问题1

对于上述类型的消息架构注册表是必需的,我应如何使用架构注册表? 我是否必须为此创建JSON结构,如果是,我将其保留在那里. 这里需要更多帮助才能了解这一点?

我已编辑

vim /usr/local/confluent/etc/schema-registry/schema-registry.properties

提到动物园小偷,但我没有kafkastore.topic=_schema 如何将此链接到自定义架构.

即使我开始遇到此错误

Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Topic _schemas not present in metadata after 60000 ms.

我期望的是,因为我对架构没有做任何事情.

我确实安装了jdbc连接器,并且在启动时出现错误

Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`

问题2 我可以在一个ec2上创建两个onnector(jdbc和弹性serach一个).如果是,我是否都必须在单独的cli中都启动?

问题3 当我打开vim/usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties 我只看到如下

的属性值

name=test-source-sqlite-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
mode=incrementing
incrementing.column.name=id
topic.prefix=trf-aurora-fspaudit-

在上面的属性文件中,我可以提及架构名称和表名称吗?

基于答案,我正在更新我的Kafka connect JDBC配置

---------------启动JDBC连接弹性搜索--------------------------- -

wget /usr/local http://packages.confluent.io/archive/5.2/confluent-5.2.0-2.11.tar.gz -P ~/Downloads/
tar -zxvf ~/Downloads/confluent-5.2.0-2.11.tar.gz -C ~/Downloads/
sudo mv ~/Downloads/confluent-5.2.0 /usr/local/confluent

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
tar -xzf  mysql-connector-java-5.1.48.tar.gz
sudo mv mysql-connector-java-5.1.48 mv /usr/local/confluent/share/java/kafka-connect-jdbc

然后

vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties

然后我修改了以下属性

connection.url=jdbc:mysql://fdgfgdfgrter.us-east-1.rds.amazonaws.com:3306/trf
mode=incrementing
connection.user=admin
connection.password=Welcome123
table.whitelist=PANStatementInstanceLog
schema.pattern=dbo

最后一次修改

vim /usr/local/confluent/etc/kafka/connect-standalone.properties

在这里,我在属性下面进行了修改

bootstrap.servers=b-3.205147-ertrtr.erer.c5.ertert.us-east-1.amazonaws.com:9092,b-6.ertert-riskaudit.ertet.c5.kafka.us-east-1.amazonaws.com:9092,b-1.ertert-riskaudit.ertert.c5.kafka.us-east-1.amazonaws.com:9092
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/local/confluent/share/java

当我列出主题时,我看不到表名称列出的任何主题.

错误消息的堆栈跟踪

[2020-01-03 07:40:57,169] ERROR Failed to create job for /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties (org.apache.kafka.connect.cli.ConnectStandalone:108)
[2020-01-03 07:40:57,169] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:119)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
        at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
        at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
        at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:116)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
        at org.apache.kafka.connect.runtime.AbstractHerder.maybeAddConfigErrors(AbstractHerder.java:423)
        at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:188)
        at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:113)

        curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" IPaddressOfKCnode:8083/connectors/ -d '{"name": "emp-connector", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": "1", "connection.url": "jdbc:mysql://IPaddressOfLocalMachine:3306/test_db?user=root&password=pwd","table.whitelist": "emp","mode": "timestamp","topic.prefix": "mysql-" } }'

解决方案

需要模式注册吗?

不.您可以在json记录中启用架构. JDBC源可以根据表信息为您创建它们

value.converter=org.apache.kafka...JsonConverter 
value.converter.schemas.enable=true

提及动物园的人,但我没有什么是kafkastore.topic = _schema

如果要使用架构注册表,则应使用具有Kafka地址而不是Zookeeper的kafkastore.bootstrap.servers..因此,删除kafkastore.connection.url

阅读文档有关所有属性的说明

我没有对架构做任何事情.

没关系.注册表第一次启动时会创建schemas主题

我可以在一个ec2上创建两个onnector吗?

是(忽略可用的JVM堆空间).同样,这在Kafka Connect文档中有详细说明.

使用独立模式,首先传递连接工作程序配置,然后在一个命令中传递多达N个连接器属性

使用分布式模式,您可以使用Kafka Connect REST API

https://docs.confluent.io/current/connect/managing /configuring.html

当我打开vim/usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties

首先,这是针对Sqlite的,而不是针对Mysql/Postgres的.您不需要使用快速入门文件,这些文件仅在此处提供参考

同样,所有属性都有据可查

https://docs.confluent .io/current/connect/kafka-connect-jdbc/index.html#connect-jdbc

我确实安装了jdbc连接器,并且在启动时出现以下错误

有关如何调试该信息的更多信息

https://www.confluent. io/blog/kafka-connect-deep-dive-jdbc-source-connector/


如前所述,我个人建议尽可能使用Debezium/CDC

用于RDS Aurora的Debezium连接器

I have to send records from Aurora/Mysql to MSK and from there to Elastic search service

Aurora -->Kafka-connect--->AWS MSK--->kafka connect --->Elastic search

The record in Aurora table structure is something like this
I think record will go to AWS MSK in this format.

"o36347-5d17-136a-9749-Oe46464",0,"NEW_CASE","WRLDCHK","o36347-5d17-136a-9749-Oe46464","<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?><caseCreatedPayload><batchDetails/>","CASE",08-JUL-17 10.02.32.217000000 PM,"TIME","UTC","ON","0a348753-5d1e-17a2-9749-3345,MN4,","","0a348753-5d1e-17af-9749-FGFDGDFV","EOUHEORHOE","2454-5d17-138e-9749-setwr23424","","","",,"","",""

So in order to consume by elastic search i need to use proper schema so schema registry i have to use.

My question

Question 1

How should i use schema registry for above type of message schema registry is required ?. Do i have to create JSON structure for this and if yes where i have keep that. More help required here to understand this ?

I have edited

vim /usr/local/confluent/etc/schema-registry/schema-registry.properties

Mentioned zookeper but i did not what is kafkastore.topic=_schema How to link this to custom schema .

Even i started and got this error

Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Topic _schemas not present in metadata after 60000 ms.

Which i was expecting because i did not do anything about schema .

I do have jdbc connector installed and when i start i get below error

Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
Invalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123 for configuration Couldn't open connection to jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`

Question 2 Can i create two onnector on one ec2 (jdbc and elastic serach one ).If yes do i have to start both in sepearte cli ?

Question 3 When i open vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties I see only propeties value like below

name=test-source-sqlite-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:mysql://123871289-eruyre.cluster-ceyey.us-east-1.rds.amazonaws.com:3306/trf?user=admin&password=Welcome123
mode=incrementing
incrementing.column.name=id
topic.prefix=trf-aurora-fspaudit-

In the above properties file where i can mention schema name and table name?

Based on answer i am updating my configuration for Kafka connect JDBC

---------------start JDBC connect elastic search -----------------------------

wget /usr/local http://packages.confluent.io/archive/5.2/confluent-5.2.0-2.11.tar.gz -P ~/Downloads/
tar -zxvf ~/Downloads/confluent-5.2.0-2.11.tar.gz -C ~/Downloads/
sudo mv ~/Downloads/confluent-5.2.0 /usr/local/confluent

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.48.tar.gz
tar -xzf  mysql-connector-java-5.1.48.tar.gz
sudo mv mysql-connector-java-5.1.48 mv /usr/local/confluent/share/java/kafka-connect-jdbc

And then

vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties

Then i modified below properties

connection.url=jdbc:mysql://fdgfgdfgrter.us-east-1.rds.amazonaws.com:3306/trf
mode=incrementing
connection.user=admin
connection.password=Welcome123
table.whitelist=PANStatementInstanceLog
schema.pattern=dbo

Last i modified

vim /usr/local/confluent/etc/kafka/connect-standalone.properties

and here i modified below properties

bootstrap.servers=b-3.205147-ertrtr.erer.c5.ertert.us-east-1.amazonaws.com:9092,b-6.ertert-riskaudit.ertet.c5.kafka.us-east-1.amazonaws.com:9092,b-1.ertert-riskaudit.ertert.c5.kafka.us-east-1.amazonaws.com:9092
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/local/confluent/share/java

When i list topic i do not see any topic listed for table name .

Stack trace for the error message

[2020-01-03 07:40:57,169] ERROR Failed to create job for /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties (org.apache.kafka.connect.cli.ConnectStandalone:108)
[2020-01-03 07:40:57,169] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:119)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
        at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:79)
        at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:66)
        at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:116)
Caused by: org.apache.kafka.connect.runtime.rest.errors.BadRequestException: Connector configuration is invalid and contains the following 2 error(s):
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
Invalid value com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. for configuration Couldn't open connection to jdbc:mysql://****.us-east-1.rds.amazonaws.com:3306/trf
You can also find the above list of errors at the endpoint `/{connectorType}/config/validate`
        at org.apache.kafka.connect.runtime.AbstractHerder.maybeAddConfigErrors(AbstractHerder.java:423)
        at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:188)
        at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:113)

        curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" IPaddressOfKCnode:8083/connectors/ -d '{"name": "emp-connector", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": "1", "connection.url": "jdbc:mysql://IPaddressOfLocalMachine:3306/test_db?user=root&password=pwd","table.whitelist": "emp","mode": "timestamp","topic.prefix": "mysql-" } }'

解决方案

schema registry is required ?

No. You can enable schemas in json records. JDBC source can create them for you based on the table information

value.converter=org.apache.kafka...JsonConverter 
value.converter.schemas.enable=true

Mentioned zookeper but i did not what is kafkastore.topic=_schema

If you want to use Schema Registry, you should be using kafkastore.bootstrap.servers.with the Kafka address, not Zookeeper. So remove kafkastore.connection.url

Please read the docs for explanations of all properties

i did not do anything about schema .

Doesn't matter. The schemas topic gets created when the Registry first starts

Can i create two onnector on one ec2

Yes (ignoring available JVM heap space). Again, this is detailed in the Kafka Connect documentation.

Using standalone mode, you first pass the connect worker configuration, then up to N connector properties in one command

Using distributed mode, you use the Kafka Connect REST API

https://docs.confluent.io/current/connect/managing/configuring.html

When i open vim /usr/local/confluent/etc/kafka-connect-jdbc/source-quickstart-sqlite.properties

First of all, that's for Sqlite, not Mysql/Postgres. You don't need to use the quickstart files, they are only there for reference

Again, all properties are well documented

https://docs.confluent.io/current/connect/kafka-connect-jdbc/index.html#connect-jdbc

I do have jdbc connector installed and when i start i get below error

Here's more information about how you can debug that

https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/


As stated before, I would personally suggest using Debezium/CDC where possible

Debezium Connector for RDS Aurora

这篇关于Kafka连接设置以使用AWS MSK从Aurora发送记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆