列数据到Spark结构化流中的嵌套json对象 [英] Column data to nested json object in Spark structured streaming

查看：66 发布时间：2020/9/4 4:46:53 apache-spark elasticsearch spark-structured-streaming

本文介绍了列数据到Spark结构化流中的嵌套json对象的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我们的应用程序中，我们使用Spark sql获得字段值作为列.我正在尝试弄清楚如何将列值放入嵌套的json对象并推送到Elasticsearch.还可以通过参数化selectExpr中的值来传递给正则表达式吗?

In our application we obtain the field values as columns using Spark sql. Im' trying to figure out how to put the columns values to nested json object and push to Elasticsearch. Also is there a way to parameterise values in selectExpr to pass to the regex?

我们当前正在使用Spark Java API.

We are currently using the Spark Java API.

Dataset<Row> data = rowExtracted.selectExpr("split(value,\"[|]\")[0] as channelId",
                "split(value,\"[|]\")[1] as country",
                "split(value,\"[|]\")[2] as product",
                "split(value,\"[|]\")[3] as sourceId",
                "split(value,\"[|]\")[4] as systemId",
                "split(value,\"[|]\")[5] as destinationId",
                "split(value,\"[|]\")[6] as batchId",
                "split(value,\"[|]\")[7] as orgId",
                "split(value,\"[|]\")[8] as businessId",
                "split(value,\"[|]\")[9] as orgAccountId",
                "split(value,\"[|]\")[10] as orgBankCode",
                "split(value,\"[|]\")[11] as beneAccountId",
                "split(value,\"[|]\")[12] as beneBankId",
                "split(value,\"[|]\")[13] as currencyCode",
                "split(value,\"[|]\")[14] as amount",
                "split(value,\"[|]\")[15] as processingDate",
                "split(value,\"[|]\")[16] as status",
                "split(value,\"[|]\")[17] as rejectCode",
                "split(value,\"[|]\")[18] as stageId",
                "split(value,\"[|]\")[19] as stageStatus",
                "split(value,\"[|]\")[20] as stageUpdatedTime",
                "split(value,\"[|]\")[21] as receivedTime",
                "split(value,\"[|]\")[22] as sendTime"
        );

StreamingQuery query = data.writeStream()
                .outputMode(OutputMode.Append()).format("es").option("checkpointLocation", "C:\\checkpoint")
                .start("spark_index/doc")

实际输出:

{
  "_index": "spark_index",
  "_type": "doc",
  "_id": "test123",
  "_version": 1,
  "_score": 1,
  "_source": {
    "channelId": "test",
    "country": "SG",
    "product": "test",
    "sourceId": "",
    "systemId": "test123",
    "destinationId": "",
    "batchId": "",
    "orgId": "test",
    "businessId": "test",
    "orgAccountId": "test",
    "orgBankCode": "",
    "beneAccountId": "test",
    "beneBankId": "test",
    "currencyCode": "SGD",
    "amount": "53.0000",
    "processingDate": "",
    "status": "Pending",
    "rejectCode": "test",
    "stageId": "123",
    "stageStatus": "Comment",
    "stageUpdatedTime": "2019-08-05 18:11:05.999000",
    "receivedTime": "2019-08-05 18:10:12.701000",
    "sendTime": "2019-08-05 18:11:06.003000"
  }
}

我们需要在节点"txn_summary"下的上述列，例如以下json:

We need the above columns under a node "txn_summary" such as the below json:

预期输出:

{
  "_index": "spark_index",
  "_type": "doc",
  "_id": "test123",
  "_version": 1,
  "_score": 1,
  "_source": {
    "txn_summary": {
      "channelId": "test",
      "country": "SG",
      "product": "test",
      "sourceId": "",
      "systemId": "test123",
      "destinationId": "",
      "batchId": "",
      "orgId": "test",
      "businessId": "test",
      "orgAccountId": "test",
      "orgBankCode": "",
      "beneAccountId": "test",
      "beneBankId": "test",
      "currencyCode": "SGD",
      "amount": "53.0000",
      "processingDate": "",
      "status": "Pending",
      "rejectCode": "test",
      "stageId": "123",
      "stageStatus": "Comment",
      "stageUpdatedTime": "2019-08-05 18:11:05.999000",
      "receivedTime": "2019-08-05 18:10:12.701000",
      "sendTime": "2019-08-05 18:11:06.003000"
    }
  }
}

列数据到Spark结构化流中的嵌套json对象 [英] Column data to nested json object in Spark structured streaming

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

列数据到Spark结构化流中的嵌套json对象 [英] Column data to nested json object in Spark structured streaming

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭