Spark Streaming过滤流数据 [英] Spark Streaming Filtering the Streaming data

查看：77 发布时间：2021/4/8 20:10:40 apache-spark spark-streaming spark-cassandra-connector

本文介绍了Spark Streaming过滤流数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试过滤流数据，并且基于id列的值，我想将数据保存到不同的表中

I am trying to filter the Streaming Data, and based on the value of the id column i want to save the data to different tables

我有两个表

testTable_odd(id，data1，data2)
testTable_even(id，data1)

如果id值是奇数，那么我想将记录保存到testTable_odd表中，如果值是偶数，那么我想将记录保存到testTable_evend中.

if the id value is odd then i want to save record to testTable_odd table and if the value is even then i want to save record to testTable_even.

这里最棘手的部分是我的两个表具有不同的列.尝试了多种方式，考虑了返回类型为Either [obj1，obj2]的Scala函数，但是我无法成功，任何指针将不胜感激.

the tricky part here is my two tables has different columns. tried multiple ways, considered Scala functions with return type Either[obj1,obj2], but i wasn't able to succeed, any pointers would be greatly appreciated.

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SaveMode
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka.KafkaUtils
import com.datastax.spark.connector._

import kafka.serializer.StringDecoder
import org.apache.spark.rdd.RDD
import com.datastax.spark.connector.SomeColumns
import java.util.Formatter.DateTime

object StreamProcessor extends Serializable {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("StreamProcessor")
      .set("spark.cassandra.connection.host", "127.0.0.1")

    val sc = new SparkContext(sparkConf)

    val ssc = new StreamingContext(sc, Seconds(2))

    val sqlContext = new SQLContext(sc)

    val kafkaParams = Map("metadata.broker.list" -> "localhost:9092")

    val topics = args.toSet

    val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, topics)


        stream
  .map { 
    case (_, msg) => 
      val result = msgParseMaster(msg)
      (result.id, result.data)
   }.foreachRDD(rdd => if (!rdd.isEmpty)     rdd.saveToCassandra("testKS","testTable",SomeColumns("id","data")))

      }
    }

    ssc.start()
    ssc.awaitTermination()

  }

  import org.json4s._
  import org.json4s.native.JsonMethods._
  case class wordCount(id: Long, data1: String, data2: String) extends serializable
  implicit val formats = DefaultFormats
  def msgParseMaster(msg: String): wordCount = {
    val m = parse(msg).extract[wordCount]
    return m

  }

}

Spark Streaming过滤流数据 [英] Spark Streaming Filtering the Streaming data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Streaming过滤流数据 [英] Spark Streaming Filtering the Streaming data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭