从NIFI中的XML提取属性 [英] Extract attributes from xml in nifi

查看:0
本文介绍了从NIFI中的XML提取属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这些从ftp获取的XML文件(使用list和Fetch ftp处理器)。我想从XML文件中获取值,并将文件替换为这些值,因为它是CSV。(并使用putFtp处理器将它们放回ftp)

所需输出如下所示:

{"foodate":"somedate","name":"fooid1_foovalue","value":5.44}
{"foodate":"somedate","name":"fooid1_metrics","value":some-metrics}
.
.
.
{"foodate":"somedate","name":"fooid2_foovalue","value":2.34}
.
.
.

因此,对于每个id,首先写入fooDate属性,然后写入Id1、Sample属性1、Id1、Sample属性2等。

但是,每次我都不知道属性的名称或数量,只知道第一个示例属性是fooDate。你知道该怎么处理吗?我尝试了使用执行脚本处理器和js,但它似乎无法识别DOMParser()等。

<?xml version="1.0" encoding="ISO-8859-1"?>
<Document Version="2">
    <ExportData lowerBound="2021/11/24 16:58:26" upperBound="2021/11/24 22:58:26">
        <Site name="name" f="">
            <Kapta fooid1="some-number">
                <Infos>
                    <Info>
                        <EndPoint foo="value-name" />
                    </Info>
                </Infos>
                <Samples ordering="desc">
                    <Sample foodate="some-date" foovalue="5.44" metrics="some-metrics" metrics2="metrics-again" value="numbers5" te="numbers" />
                    <Sample foodate="some-date" foovalue="7.45" foom="some-metrics" metrics453="metrics-again" otherattribut="numbers5" att345="numbers" morevalues="numbers" foohdeiurf="numbers" hello="numbers"/>
                </Samples>
            </Kapta>
            <Kapta fooid2="some-number">
                <Infos>
                    <Info>
                        <EndPoint foo="value-name" />
                    </Info>
                </Infos>
                <Samples ordering="desc">
                    <Sample foodate="some-date" foovalue="2.34" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbersagain" />
                    <Sample foodate="some-date" foo="99.8" metrics="some-metrics" metrics2="metrics-again" value="numbers" te="numbers" />
                    <Sample foodate="some-date" attr="234.56" someothermetrics="some-metrics" metr="metrics-again" anothervalue="numbers" />
                </Samples>
            </Kapta>
        </Site>
    </ExportData>
</Document>

Thanks a lot for your time and effort!

推荐答案

您可以使用groovy XML解析器库。根据您的需要有很多选择,请勾选this

这里是一个实验代码,它从传入的流文件的内容中获取XML,并将一些提取输出为json列表。您可以根据需要进行开发

请注意,此代码可能不是生产级代码。有关Nifi中的Groovy的更多信息,请参阅ExecuteScript cookbook

import org.apache.nifi.flowfile.FlowFile;
import org.apache.commons.io.IOUtils
import org.apache.nifi.processor.io.InputStreamCallback
import org.apache.nifi.processor.io.StreamCallback
import java.nio.charset.StandardCharsets
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import groovy.xml.dom.DOMCategory
import groovy.json.JsonGenerator

def flowFile

try {
    
    flowFile = session.get()
    
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = null

    session.read(flowFile, {inputStream ->
        doc =  dBuilder.parse(inputStream)
    } as InputStreamCallback)
    
    def root = doc.documentElement
    def sb = new StringBuilder()
    def jsonGenerator = new JsonGenerator.Options().disableUnicodeEscaping().build()
    
    // get a specific attribute
    use(DOMCategory) {
         root['ExportData']['Site']['Kapta']['Infos']['Info']['*'].findAll { node ->
            def data = new LinkedHashMap()
            data.NodeName = node.name()
            data.foodate = node['@foo']
            sb.append(jsonGenerator.toJson(data))
            sb.append('
')
        }   
    }
    
    // get all attributes of Sample under Samples
    use(DOMCategory) {
        root['ExportData']['Site']['Kapta']['Samples']['*'].findAll { node ->
            def data = new LinkedHashMap()
            data.NodeName = node.name()
            def attributesMap = node.attributes()
            for (int x = 0; x < attributesMap.getLength(); x++) {
                data.AttrName = attributesMap.item(x).getNodeName();
                data.AttrValue = attributesMap.item(x).getNodeValue();
                sb.append(jsonGenerator.toJson(data))
                sb.append('
')
            }
                    
       }
    }   
    
    flowFile = session.write(flowFile, {inputStream, outputStream ->
        outputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8))
    } as StreamCallback)
    
    session.transfer(flowFile, REL_SUCCESS)
    
} catch (Exception e) {
    log.error('',e)
    session.transfer(flowFile, REL_FAILURE)
}




   

这篇关于从NIFI中的XML提取属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆