从NIFI中的XML提取属性 [英] Extract attributes from xml in nifi
本文介绍了从NIFI中的XML提取属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这些从ftp获取的XML文件(使用list和Fetch ftp处理器)。我想从XML文件中获取值,并将文件替换为这些值,因为它是CSV。(并使用putFtp处理器将它们放回ftp)
所需输出如下所示:
{"foodate":"somedate","name":"fooid1_foovalue","value":5.44}
{"foodate":"somedate","name":"fooid1_metrics","value":some-metrics}
.
.
.
{"foodate":"somedate","name":"fooid2_foovalue","value":2.34}
.
.
.
因此,对于每个id,首先写入fooDate属性,然后写入Id1、Sample属性1、Id1、Sample属性2等。
但是,每次我都不知道属性的名称或数量,只知道第一个示例属性是fooDate。你知道该怎么处理吗?我尝试了使用执行脚本处理器和js,但它似乎无法识别DOMParser()等。<?xml version="1.0" encoding="ISO-8859-1"?>
<Document Version="2">
<ExportData lowerBound="2021/11/24 16:58:26" upperBound="2021/11/24 22:58:26">
<Site name="name" f="">
<Kapta fooid1="some-number">
<Infos>
<Info>
<EndPoint foo="value-name" />
</Info>
</Infos>
<Samples ordering="desc">
<Sample foodate="some-date" foovalue="5.44" metrics="some-metrics" metrics2="metrics-again" value="numbers5" te="numbers" />
<Sample foodate="some-date" foovalue="7.45" foom="some-metrics" metrics453="metrics-again" otherattribut="numbers5" att345="numbers" morevalues="numbers" foohdeiurf="numbers" hello="numbers"/>
</Samples>
</Kapta>
<Kapta fooid2="some-number">
<Infos>
<Info>
<EndPoint foo="value-name" />
</Info>
</Infos>
<Samples ordering="desc">
<Sample foodate="some-date" foovalue="2.34" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbersagain" />
<Sample foodate="some-date" foo="99.8" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbers" />
<Sample foodate="some-date" attr="234.56" someothermetrics="some-metrics" metr="metrics-again" anothervalue="numbers" />
</Samples>
</Kapta>
</Site>
</ExportData>
</Document>
Thanks a lot for your time and effort!
推荐答案
您可以使用groovy XML解析器库。根据您的需要有很多选择,请勾选this
这里是一个实验代码,它从传入的流文件的内容中获取XML,并将一些提取输出为json列表。您可以根据需要进行开发
请注意,此代码可能不是生产级代码。有关Nifi中的Groovy的更多信息,请参阅ExecuteScript cookbookimport org.apache.nifi.flowfile.FlowFile;
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.InputStreamCallback
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import groovy.xml.dom.DOMCategory
import groovy.json.JsonGenerator
def flowFile
try {
flowFile = session.get()
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = null
session.read(flowFile, {inputStream ->
doc = dBuilder.parse(inputStream)
} as InputStreamCallback)
def root = doc.documentElement
def sb = new StringBuilder()
def jsonGenerator = new JsonGenerator.Options().disableUnicodeEscaping().build()
// get a specific attribute
use(DOMCategory) {
root['ExportData']['Site']['Kapta']['Infos']['Info']['*'].findAll { node ->
def data = new LinkedHashMap()
data.NodeName = node.name()
data.foodate = node['@foo']
sb.append(jsonGenerator.toJson(data))
sb.append('
')
}
}
// get all attributes of Sample under Samples
use(DOMCategory) {
root['ExportData']['Site']['Kapta']['Samples']['*'].findAll { node ->
def data = new LinkedHashMap()
data.NodeName = node.name()
def attributesMap = node.attributes()
for (int x = 0; x < attributesMap.getLength(); x++) {
data.AttrName = attributesMap.item(x).getNodeName();
data.AttrValue = attributesMap.item(x).getNodeValue();
sb.append(jsonGenerator.toJson(data))
sb.append('
')
}
}
}
flowFile = session.write(flowFile, {inputStream, outputStream ->
outputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
} catch (Exception e) {
log.error('',e)
session.transfer(flowFile, REL_FAILURE)
}
这篇关于从NIFI中的XML提取属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文