对映射文件中的多列使用 ReplaceTextWithMapping [英] Using ReplaceTextWithMapping with multiple columns in mapping file

查看:53
本文介绍了对映射文件中的多列使用 ReplaceTextWithMapping的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的具体情况下,我需要澄清 NiFi 中 ReplaceTextWithMapping 的用法.我的输入文件如下所示:

{"field1" : "A","field2" : "A",字段3":A"}

映射文件看起来像这样:

 Header1;Header2;Header3A;一些文字;2

我的预期结果如下:

 {"field1" : "一些文字","field2": "A",字段3":A2"}

正则表达式集简单如下:

[A-Z0-9]+

并且它匹配映射文件中的字段键(我们需要大写字母或大写字母 + 数字),但是我不确定您是如何决定您想要哪个值(来自 col 2 或来自 col3)将输入值分配给.此外,我的 field2 不应该改变并且需要保留它从输入值获得的相同值,不涉及映射.目前,我得到了这样的东西:

 {"field1" : "一些文本 A2","field2": "一些文本 A2","field3": "一些文本 A2"}

我想我的主要问题是:您能否将输入文件中的相同值映射到来自映射文件不同列的不同值?

谢谢

我正在使用

源文本

{"field1" : "A","field2" : "A",字段3":A"}

更换后

{"field1" : "SomeText","field2" : "A",字段3":A2"}

说明

节点说明----------------------------------------------------------------------( 分组并捕获到 \1:----------------------------------------------------------------------\{'{'----------------------------------------------------------------------'"'----------------------------------------------------------------------[a-z0-9]+ 任何字符:'a' 到 'z','0' 到 '9'(1 次或多次(匹配最多可能的数量))----------------------------------------------------------------------'"'----------------------------------------------------------------------\s* 空格(\n、\r、\t、\f 和")(0或更多次(匹配最多的金额可能的))----------------------------------------------------------------------: ':'----------------------------------------------------------------------\s* 空格(\n、\r、\t、\f 和")(0或更多次(匹配最多的金额可能的))----------------------------------------------------------------------'"'----------------------------------------------------------------------) 结束 \1----------------------------------------------------------------------( 分组并捕获到 \2:----------------------------------------------------------------------[a-z0-9]+ 任何字符:'a' 到 'z','0' 到 '9'(1 次或多次(匹配最多可能的数量))----------------------------------------------------------------------) 结束 \2----------------------------------------------------------------------( 分组并捕获到 \3:----------------------------------------------------------------------'"'----------------------------------------------------------------------[,\r\n]+ 任何字符:',', '\r'(回车return), '\n' (换行符) (1 次或多次(匹配尽可能多的金额))----------------------------------------------------------------------'"'----------------------------------------------------------------------[a-z0-9]+ 任何字符:'a' 到 'z','0' 到 '9'(1 次或多次(匹配最多可能的数量))----------------------------------------------------------------------'"'----------------------------------------------------------------------\s* 空格(\n、\r、\t、\f 和")(0或更多次(匹配最多的金额可能的))----------------------------------------------------------------------: ':'----------------------------------------------------------------------\s* 空格(\n、\r、\t、\f 和")(0或更多次(匹配最多的金额可能的))----------------------------------------------------------------------'"'----------------------------------------------------------------------) 结束 \3----------------------------------------------------------------------(分组并捕获到 \4:----------------------------------------------------------------------[a-z0-9]+ 任何字符:'a' 到 'z','0' 到 '9'(1 次或多次(匹配最多可能的数量))----------------------------------------------------------------------) 结束 \4----------------------------------------------------------------------( 分组并捕获到 \5:----------------------------------------------------------------------'"'----------------------------------------------------------------------[,\r\n]+ 任何字符:',', '\r'(回车return), '\n' (换行符) (1 次或多次(匹配尽可能多的金额))----------------------------------------------------------------------'"'----------------------------------------------------------------------[a-z0-9]+ 任何字符:'a' 到 'z','0' 到 '9'(1 次或多次(匹配最多可能的数量))----------------------------------------------------------------------'"'----------------------------------------------------------------------\s* 空格(\n、\r、\t、\f 和")(0或更多次(匹配最多的金额可能的))----------------------------------------------------------------------: ':'----------------------------------------------------------------------\s* 空格(\n、\r、\t、\f 和")(0或更多次(匹配最多的金额可能的))----------------------------------------------------------------------'"'----------------------------------------------------------------------) 结束 \5----------------------------------------------------------------------( 分组并捕获到 \6:----------------------------------------------------------------------[a-z0-9]+ 任何字符:'a' 到 'z','0' 到 '9'(1 次或多次(匹配最多可能的数量))----------------------------------------------------------------------) 结束 \6----------------------------------------------------------------------( 分组并捕获到 \7:----------------------------------------------------------------------'"'----------------------------------------------------------------------[,\r\n]+ 任何字符:',', '\r'(回车return), '\n' (换行符) (1 次或多次(匹配尽可能多的金额))----------------------------------------------------------------------\} '}'----------------------------------------------------------------------) 结束 \7

I would need to clarify the usage of ReplaceTextWithMapping in NiFi in my specific case. My input file looks like this:

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

The mapping file looks, instead, like this:

 Header1;Header2;Header3
 A;some text;2

My expected result would be as follows:

   {"field1" : "some text",
    "field2": "A",
    "field3": "A2"
    }

The Regular Expression set is simply as follows:

[A-Z0-9]+

and it matches the field key in the mapping file (we are expecting either a capital letter or capital letter + digit), but then I am not sure how you decided to which value (from col 2 or from col3) you want to assign the input value to. Also, my field2 should not changed and needs retaining the same value it is getting from the input value, with no mapping involved. At the moment, I am getting something like this:

  {"field1" : "some text A2",
    "field2": "some text A2",
    "field3": "some text A2"
    }

I guess my main question is: can you mapped the same value in your input file with different values coming from different column of your mapping file?

Thank you

EDIT: I am using ReplaceTextWithMapping, an out-of-the-box processor in Apache NiFi (v. 0.5.1). Throughout my dataflow, I end up with a Json file on which I need to apply some mappings coming from external files I would like to load in memory (rather than parse using ExtractText, for example).

解决方案

Forward

It appears that you're working with a JSON string, it would be easier to work with such a string via a JSON parsing engine as the JSON structure allows the creation of difficult edge cases that makes parsing with regular expressions difficult. With that said, I'm sure you have your reasons, and I'm not the Regex Police.

Description

To do such a replacement it would be easier to capture the substrings you'll keep and the substrings you want to replace.

(\{"[a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+"[a-z0-9]+"\s*:\s*")([a-z0-9]+)("[,\r\n]+\})

Replace with: $1SomeText$3$4$5A2$7

Note: I recommend using the following flags with this expression: Case Insensitive, and Dot matches all characters including new lines.

Exmaples

Live Deno

This example shows how the regular expression matches against your source text: https://regex101.com/r/vM1qE2/1

Source Text

{"field1" : "A",
"field2" : "A",
"field3": "A"
}

After Replacement

{"field1" : "SomeText",
"field2" : "A",
"field3": "A2"
}

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \{                       '{'
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    [a-z0-9]+                any character of: 'a' to 'z', '0' to '9'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
    [,\r\n]+                 any character of: ',', '\r' (carriage
                             return), '\n' (newline) (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
    \}                       '}'
----------------------------------------------------------------------
  )                        end of \7

这篇关于对映射文件中的多列使用 ReplaceTextWithMapping的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆