使用Groovy进行JSON解析 [英] JSON parsing using Groovy

查看:118
本文介绍了使用Groovy进行JSON解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用通过NiFi调用的Groovy脚本解析JSON记录.根据我从带有重复键的嵌套JSON 获得的输入,它运行良好.但是,如果value列中包含双引号(),那么我的常规脚本就很少有记录失败.

I am trying to parse JSON records using Groovy script which is called through NiFi. Based on the input I got from Nested JSON with duplicate keys, it was working fine. But my groovy script got failed for few records if the value column has double quotes (") in it.

E.g.,
"CT":"text/javascript; charset="UTF-8""

为解决上述问题,我替换了记录中的所有双引号(),并使记录看起来像

To fix the above issue, I replaced all the double quotes (") from the record and made the record look like,

CT:text/javascript; charset=UTF-8

并在解析JSON时使用了JsonParserType.LAX.然后问题解决了.但是我遇到另一个问题,传入的JSON colunn值之间有逗号(,)

and used JsonParserType.LAX while parsing the JSON. Then the issue got resolved. But I faced another problem where incoming JSON colunn value was having comma (,) in between

UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75,app_store:googleplay,app_uuid:c6bed4

对于上述值,

After this piece of code got executed,
new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)

Output looks like,
UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML,

"UA"列的剩余值只是像下面这样单独显示,没有任何键,

and remaining values of the "UA" column is just coming separately like below without any key,

537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75

并且在解析的输出中完全错过了这部分

and this part is completely missed in the parsed output

 like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/

下面是我带有示例json记录的完整代码,

Below is my complete code with sample json record,

import groovy.json.*
import java.util.function.*
/* Example record for extra double quotes in the column value */
// def json = '{"transaction":[{"TS":"2020-02-04 07:02:55.074013","V":"v8","PID":"p31177","RS":"359","SR":"434","CnID":"0/1","CIPG":{"CIP":"10.176.63.122","CP":"35073","SLP":"35073","SLEP":"35073","CVID":"410"},"SIPG":{"SIP":"54.93.254.233","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"73","R":"73","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"73"},"AS":"365","OS":"365","CPr":"0","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"-","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"S","Flag23":"c","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"s","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"j","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"365","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"31/31/31 21 1/1 0.0/0.0","CT":"text/javascript; charset="UTF-8"","MS":"447715522518","MSH":"1C5SujOJcZs4SCCoUSQSynOXaCyND1q4TQ30kcryo/Y=","SID":"v4.172.16.146.148:9EE6EC7A17108A46","SuID":"-","UA":"okhttp/3.9.1","DID":"Default-Android-Web-Unknown","UAG":"Android-Web","CID":"TECH","HR":"-","CRG":"3001010006000000","CP1":"8232F40186F032F40107B06F6E","CP2":"3599650837621501","AIDF":"1:1386","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"GET","HD":"pubsub.pubnub.com","HP":"/v2/presence/sub-key/sub-c-ee729d78-6233-11e2-b80b-12313f022c90/channel/uk9_18395/heartbeat","HQ":"requestid=5bd3d724-84f7-4f91-865c-a141136e537f&state=%257B%257D&heartbeat=300&pnsdk=PubNub-Java-Unified/4.19.0&uuid=600752db-00ad-4478-9688-b8ee2b13414f"},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'
/* Example record for comma in the column value */
def json = '{"transaction":[{"TS":"2020-02-04 07:04:50.478718","V":"v8","PID":"p12318","RS":"4087","SR":"4162","CnID":"0/89","CIPG":{"CIP":"10.146.84.250","CP":"52106","SLP":"18518","SLEP":"18518","CVID":"410"},"SIPG":{"SIP":"104.18.14.201","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"36","R":"36","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"36"},"AS":"463","OS":"480","CPr":"-3","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"c","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"F","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"G","Flag23":"G","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"b","Flag4":"T","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"5","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"480","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"35/35/35 18 1/2 0.0/0.0","CT":"application/json;charset=UTF-8","MS":"447707123573","MSH":"G7RPssPDwUd1pKsRNs8KFtSEIZGIyJktwNDWboHmXds=","SID":"v4.172.16.138.148:9EE6E0766C39E5CC","SuID":"-","UA":"Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={"aaid":"9541535b-ab50-41c7-9ae4-4d729b501d75","app_store":"googleplay","app_uuid":"c6bed4","DID":"Android mobile SM-","UAG":"Android-Web","CID":"bm_unks","HR":"-","CRG":"0401020006000000","CP1":"8232F401C31032F4010018D678","CP2":"3562611081227201","AIDF":"6:5","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"POST","HD":"ire-d01.fordealdc.com","HP":"/api/dotData","HQ":""},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'

def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]').replace('"','').replace(':-,',':,').replace(':-}',':}')
def jsonRecord = new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)
println "After parsing: " + jsonRecord
static def flatten(row, prefix="") {
    def flattened = new HashMap<String, String>()
    println "Input: " + row
    row.each { String k, Object v ->
        def key = prefix ? prefix + "_" + k : k;

        if (v instanceof Map) {
            flattened.putAll(flatten(v, k))
        } else {
            flattened.put(key,v.toString())
        }
    }

    println "Output: " +flattened
    return flattened
}

static def toCSVRow(HashMap row) {
    def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"] 

    return columns.collect { column ->
        return row.containsKey(column) ? row.get(column) : ""
    }.join(',')
}

def a = jsonRecord.transaction.collect { row ->
    return flatten(row)
}.collect { row ->
    return toCSVRow(row).toString().replace('[','').replace(']','')
}.join('\n')

println a

我需要有关如何处理此问题的建议.我只想知道Groovy脚本如何正确解析所有JSON记录的数据.

I need advise on how to handle this. All I want to know is how my Groovy script can parse the data properly for all the JSON records.

推荐答案

正如对您的问题的评论所述,您尝试解析的内容不是JSON.最好的方法是从头开始,找到数据的共同点.

As the comments to your question mention, the content that you’re attempting to parse is not JSON. The best approach is to start from scratch and find the commonalities in the data.

如果每个名称/值对在单独的行上(用/r或/n分隔),则对内容进行换行以获取列表.

If each name/value pair is on a separate line (separated by /r or /n) then line break the content to get a list.

由于每对使用'='分隔它们,因此接下来遍历该列表并使用分隔符断开该行.修剪每一面,您就有名称/值对.

Since each pair uses an ‘=‘ to separate them, next iterate through that list and break the line using the separator. Trim each side and you have your name/value pair.

您需要遍历所得到的地图并进行解析,以查看是否定义了数组或对象,然后对数据进行处理以适应该情况.

You’ll need to walk through the map you get as a result and parse it to see if there are arrays or objects being defined and massage the data to accommodate that.

这篇关于使用Groovy进行JSON解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆