无法使用 Flink siddhi 库处理 kafka json 消息 [英] Not able to process kafka json message with Flink siddhi library

查看:22
本文介绍了无法使用 Flink siddhi 库处理 kafka json 消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个简单的应用程序,该应用程序将使用 Kafka 消息进行一些 cql 转换并发布到 Kafka,以下是代码:

I am trying to create a simple application where the app will consume Kafka message do some cql transform and publish to Kafka and below is the code:

Java:1.8弗林克:1.13斯卡拉:2.11flink-siddhi: 2.11-0.2.2-SNAPSHOT

JAVA: 1.8 Flink: 1.13 Scala: 2.11 flink-siddhi: 2.11-0.2.2-SNAPSHOT

我正在使用库:https://github.com/haoch/flink-siddhi

向Kafka输入json:

{
   "awsS3":{
      "ResourceType":"aws.S3",
      "Details":{
         "Name":"crossplane-test",
         "CreationDate":"2020-08-17T11:28:05+00:00"
      },
      "AccessBlock":{
         "PublicAccessBlockConfiguration":{
            "BlockPublicAcls":true,
            "IgnorePublicAcls":true,
            "BlockPublicPolicy":true,
            "RestrictPublicBuckets":true
         }
      },
      "Location":{
         "LocationConstraint":"us-west-2"
      }
   }
}

主类:

public class S3SidhiApp {
    public static void main(String[] args) {
        internalStreamSiddhiApp.start();
        //kafkaStreamApp.start();
    }
}

应用类:

package flinksidhi.app;

import com.google.gson.JsonObject;
import flinksidhi.event.s3.source.S3EventSource;

import io.siddhi.core.SiddhiManager;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.siddhi.SiddhiCEP;
import org.json.JSONObject;


import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Map;

import static flinksidhi.app.connector.Consumers.createInputMessageConsumer;
import static flinksidhi.app.connector.Producer.*;

public class internalStreamSiddhiApp {

    private static final String inputTopic = "EVENT_STREAM_INPUT";
    private static final String outputTopic = "EVENT_STREAM_OUTPUT";
    private static final String consumerGroup = "EVENT_STREAM1";
    private static final String kafkaAddress = "localhost:9092";
    private static final String zkAddress = "localhost:2181";

    private static final String S3_CQL1 = "from inputStream select * insert into temp";
    private static final String S3_CQL = "from inputStream select json:toObject(awsS3) as obj insert into temp;" +
            "from temp select json:getString(obj,'$.awsS3.ResourceType') as affected_resource_type," +
            "json:getString(obj,'$.awsS3.Details.Name') as affected_resource_name," +
            "json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration') as encryption," +
            "json:getString(obj,'$.awsS3.Encryption.ServerSideEncryptionConfiguration.Rules[0].ApplyServerSideEncryptionByDefault.SSEAlgorithm') as algorithm insert into temp2; " +
            "from temp2 select  affected_resource_name,affected_resource_type, " +
            "ifThenElse(encryption == ' ','Fail','Pass') as state," +
            "ifThenElse(encryption != ' ' and algorithm == 'aws:kms','None','Critical') as severity insert into outputStream";


    public static void start(){
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        //DataStream<String> inputS = env.addSource(new S3EventSource());

        //Flink kafka stream consumer
        FlinkKafkaConsumer<String> flinkKafkaConsumer =
                createInputMessageConsumer(inputTopic, kafkaAddress,zkAddress, consumerGroup);

        //Add Data stream source -- flink consumer
        DataStream<String> inputS = env.addSource(flinkKafkaConsumer);
        SiddhiCEP cep = SiddhiCEP.getSiddhiEnvironment(env);

        cep.registerExtension("json:toObject", io.siddhi.extension.execution.json.function.ToJSONObjectFunctionExtension.class);
        cep.registerExtension( "json:getString", io.siddhi.extension.execution.json.function.GetStringJSONFunctionExtension.class);
        cep.registerStream("inputStream", inputS, "awsS3");


        inputS.print();

        System.out.println(cep.getDataStreamSchemas());
        //json needs extension jars to present during runtime.
        DataStream<Map<String,Object>> output = cep
                .from("inputStream")
                .cql(S3_CQL1)
                .returnAsMap("temp");


        //Flink kafka stream Producer
        FlinkKafkaProducer<Map<String, Object>> flinkKafkaProducer =
                createMapProducer(env,outputTopic, kafkaAddress);

        //Add Data stream sink -- flink producer
        output.addSink(flinkKafkaProducer);
        output.print();


        try {
            env.execute();
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

}

消费者类:

package flinksidhi.app.connector;


import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.json.JSONObject;

import java.util.Properties;
public class Consumers {
    public static FlinkKafkaConsumer<String> createInputMessageConsumer(String topic, String kafkaAddress, String zookeeprAddr, String kafkaGroup ) {
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", kafkaAddress);
        properties.setProperty("zookeeper.connect", zookeeprAddr);
        properties.setProperty("group.id",kafkaGroup);
        FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<String>(
                topic,new SimpleStringSchema(),properties);
        return consumer;
    }
}

生产者类:

package flinksidhi.app.connector;

import flinksidhi.app.util.ConvertJavaMapToJson;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.util.serialization.KeyedSerializationSchema;
import org.json.JSONObject;

import java.util.Map;

public class Producer {

    public static FlinkKafkaProducer<Tuple2> createStringProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {

        return new FlinkKafkaProducer<Tuple2>(kafkaAddress, topic, new AverageSerializer());
    }

    public static FlinkKafkaProducer<Map<String,Object>> createMapProducer(StreamExecutionEnvironment env, String topic, String kafkaAddress) {

        return new FlinkKafkaProducer<Map<String,Object>>(kafkaAddress, topic, new SerializationSchema<Map<String, Object>>() {
            @Override
            public void open(InitializationContext context) throws Exception {

            }

            @Override
            public byte[] serialize(Map<String, Object> stringObjectMap) {
                String json = ConvertJavaMapToJson.convert(stringObjectMap);
                return json.getBytes();
            }
        });
    }
}

我尝试了很多东西,但是调用 CQL 的代码从未被调用过,甚至没有给出任何错误,不知道哪里出了问题.

I have tried many things but the code where the CQL is invoked is never called and doesn't even give any error not sure where is it going wrong.

如果我创建一个内部流源并使用相同的输入 json 以字符串形式返回它的工作原理,同样的事情.

The same thing if I do creating an internal stream source and use the same input json to return as string it works.

推荐答案

初步猜测:如果您使用的是偶数时间,您确定您正确定义了水印吗?如文档中所述:

Initial guess: if you are using even time, are you sure you have defined watermarks correctly? As stated in the docs:

(...) 一个传入的元素最初被放在一个缓冲区中,其中元素根据它们的时间戳按升序排序,当水印到达时,这个缓冲区中所有时间戳小于水印的元素都是已处理 (...)

(...) an incoming element is initially put in a buffer where elements are sorted in ascending order based on their timestamp, and when a watermark arrives, all the elements in this buffer with timestamps smaller than that of the watermark are processed (...)

如果这没有帮助,我建议将工作分解/简化到最低限度,例如只是一个源操作员和一些简单的接收器打印/记录元素.如果可行,请开始一个一个地添加回运算符.您也可以从尽可能简化 CEP 模式开始.

If this doesn't help, I would suggest to decompose/simplify the job to a bare minimum, for example just a source operator and some naive sink printing/logging elements. And if that works, start adding back operators one by one. You could also start by simplifying your CEP pattern as much as possible.

这篇关于无法使用 Flink siddhi 库处理 kafka json 消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆