从JSON重命名无效的密钥 [英] rename invalid keys from JSON

查看:139
本文介绍了从JSON重命名无效的密钥的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在NIFI中有以下流程,JSON中有(1000+)个对象.

I have following flow in NIFI , JSON has (1000+) objects in it.

invokeHTTP->SPLIT JSON->putMongo

流工作正常,直到我在json中收到带有."的键为止.在名字里.例如"spark.databricks.acl.dfAclsEnabled".

Flow works fine, till I receive some keys in json with "." in the name. e.g. "spark.databricks.acl.dfAclsEnabled".

我当前的解决方案不是最佳解决方案,我记下了错误的键,并使用多个替换文本处理器替换了".和 "_".我不使用REGEX,而是使用字符串文字查找/替换.因此,每当putMongo处理器出现故障时,我都会插入新的replaceText处理器.

my current solution is not optimal, I have jotted down bad keys, and using multiple replace text processor to replace "." with "_". I am not using REGEX, I am using string literal find/replace. So each time I am getting failure in putMongo processor, I am inserting new replaceText processor.

这是无法维护的.我想知道是否可以使用JOLT?关于输入JSON的一些信息.

This is not maintainable. I am wondering if I can use JOLT for this? couple of info regarding input JSON.

1)没有固定的结构,只有被确认的东西.一切都将在事件数组中.但是事件对象本身是自由形式.

1) no set structure, only thing that is confirmed is. everything will be in events array. But event object itself is free form.

2)最大列表大小= 1000.

2) maximum list size = 1000.

3)第三方JSON,所以我不能要求更改格式.

3) 3rd party JSON, so I cant ask for change in format.

此外,带有."的键也可以出现在任何地方.因此,我正在寻找可以在所有级别清除然后重命名的JOLT规范.

Also, key with ".", can appear anywhere. So I am looking for JOLT spec that can cleanse at all level and then rename it.

{
  "events": [
    {
            "cluster_id": "0717-035521-puny598",
            "timestamp": 1531896847915,
            "type": "EDITED",
            "details": {
                "previous_attributes": {
                    "cluster_name": "Kylo",
                    "spark_version": "4.1.x-scala2.11",
                    "spark_conf": {
                        "spark.databricks.acl.dfAclsEnabled": "true",
                        "spark.databricks.repl.allowedLanguages": "python,sql"
                    },
                    "node_type_id": "Standard_DS3_v2",
                    "driver_node_type_id": "Standard_DS3_v2",
                    "autotermination_minutes": 10,
                    "enable_elastic_disk": true,
                    "cluster_source": "UI"
                },
                "attributes": {
                    "cluster_name": "Kylo",
                    "spark_version": "4.1.x-scala2.11",
                    "node_type_id": "Standard_DS3_v2",
                    "driver_node_type_id": "Standard_DS3_v2",
                    "autotermination_minutes": 10,
                    "enable_elastic_disk": true,
                    "cluster_source": "UI"
                },
                "previous_cluster_size": {
                    "autoscale": {
                        "min_workers": 1,
                        "max_workers": 8
                    }
                },
                "cluster_size": {
                    "autoscale": {
                        "min_workers": 1,
                        "max_workers": 8
                    }
                },
                "user": ""
            }
        },
    {
      "cluster_id": "0717-035521-puny598",
      "timestamp": 1535540053785,
      "type": "TERMINATING",
      "details": {
        "reason": {
          "code": "INACTIVITY",
          "parameters": {
            "inactivity_duration_min": "15"
          }
        }
      }
    },
    {
      "cluster_id": "0717-035521-puny598",
      "timestamp": 1535537117300,
      "type": "EXPANDED_DISK",
      "details": {
        "previous_disk_size": 29454626816,
        "disk_size": 136828809216,
        "free_space": 17151311872,
        "instance_id": "6cea5c332af94d7f85aff23e5d8cea37"
      }
    }
  ]
}

推荐答案

我创建了模板使用ReplaceTextRouteOnContent执行此任务.循环是必需的,因为每次通过时,正则表达式仅替换JSON密钥中的第一个..您也许可以改进此方法,以便一次完成所有替换操作,但是在将正则表达式与前瞻性组和后视组模糊化几分钟之后,重新路由会更快.我验证了此方法是否可与您提供的JSON一起使用,以及与JSON和不同行上的键和值(在任一行上均为:)一起使用的

I created a template using ReplaceText and RouteOnContent to perform this task. The loop is required because the regex only replaces the first . in the JSON key on each pass. You might be able to refine this to perform all substitutions in a single pass, but after fuzzing the regex with the look-ahead and look-behind groups for a few minutes, re-routing was faster. I verified this works with the JSON you provided, and also JSON with the keys and values on different lines (: on either):

...
"spark_conf": {
                        "spark.databricks.acl.dfAclsEnabled":
 "true",
                        "spark.databricks.repl.allowedLanguages"
: "python,sql"
                    },
...

您还可以将ExecuteScript处理器与Groovy一起使用以提取JSON,快速过滤所有包含.的JSON密钥,执行collect操作以进行替换以及将这些密钥重新插入JSON如果您希望单个处理器一次完成此操作,则返回数据.

You could also use an ExecuteScript processor with Groovy to ingest the JSON, quickly filter all JSON keys that contain ., perform a collect operation to do the replacement, and re-insert the keys in the JSON data if you want a single processor to do this in a single pass.

这篇关于从JSON重命名无效的密钥的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆