我该如何浏览&列出XML消息的XPATH? [英] How can I browse & list XPATH of a XML Message?

查看:55
本文介绍了我该如何浏览&列出XML消息的XPATH?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

**** 请参阅下面的编辑"部分:

感谢您调查此问题.我不确定这是否是发布此主题的合适论坛.如果没有,请让我知道发布此主题的正确论坛.

Thanks for looking into this issue. I am not sure, whether this is the right forum to post this thread. If not, let me know the right forum to post this thread.

我们有一个复杂的XML消息(XML格式的数据).我们正在探索一种提取此XML消息及其元素/属性级别数据内容的所有XPATH的方法.我们尝试使用XMLSPY,&xmltwig,但是没有运气.如果我们提供XPATH输入,则Xml_grep将提取数据.xml_grep中没有选项来浏览XML消息的所有XPATH.

We have a complex XML Message (data in XML format). We are exploring a way to extract all the XPATHs of this XML message and its element/attribute level data content. We tried with XMLSPY, & xmltwig, but no luck. Xml_grep pulls data, if we give XPATH input. There is no option in xml_grep to browse all XPATHS of a XML message.

我有格式正确的XML消息.我想将列表/报告生成为

I have well-formed XML message. I want to produce a list/report as

  1. XML消息的所有Xpath (浏览所有XPATH和XML消息列表)

Xpath,此XPATH 的数据内容(浏览所有XPATH,数据内容并列出XML消息)

Xpath , data content for this XPATH (Browse all XPATH, data content and list both of XML message)

这里是一个示例(输入XML消息)

Here is an example (Input XML Message)

<?xml version="1.0"?>
<PARTS>
<TITLE>Computer Parts</TITLE>
<PART>
<ITEM>Motherboard</ITEM>
<MANUFACTURER>ASUS</MANUFACTURER>
<MODEL>P3B-F</MODEL>
<COST> 123.00</COST>
</PART>
<PART>
<ITEM>Video Card</ITEM>
<MANUFACTURER>ATI</MANUFACTURER>
<MODEL>All-in-Wonder Pro</MODEL>
<COST> 160.00</COST>
</PART>
<PART>
<ITEM>Sound Card</ITEM>
<MANUFACTURER>Creative Labs</MANUFACTURER>
<MODEL>Sound Blaster Live</MODEL>
<COST> 80.00</COST>
</PART>
<PART>
<ITEM>inch Monitor</ITEM>
<MANUFACTURER>LG Electronics</MANUFACTURER>
<MODEL> 995E</MODEL>
<COST> 290.00</COST>
</PART>
</PARTS>

所需的输出->我手动创建了以下XML列表

The desired output --> I created the following XML list manually

/PARTS/TITLE Computer       Parts
/PARTS/PART[1]/ITEM         Motherboard
/PARTS/PART[1]/MANUFACTURER ASUS
/PARTS/PART[1]/MODEL        P3B-F
/PARTS/PART[1]/COST         123.00
/PARTS/PART[2]/ITEM         Video Card
/PARTS/PART[2]/MANUFACTURER ATI
............
..............
..................
...................

是否有任何开源产品可为XML Message生成此类报告?

提取XPATH/XPATH数据内容的方法是什么?

感谢允许挑剔这个论坛的人.

Thanks for allowing to pick the brain of this forum.

+++++

谢谢.上面的代码输出

Field|Value
/*|

/*/*[1]|X
/*/*[2]|000000000
/*/*[3]|000000000
/*/*[4]|&
/*/*[5]|

我无法获取文本xpath

I am not able to get text xpath

这是输入xml

<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>&amp;</EmployerNameControlTxt>
<EmployerName>
    <BusinessNameLine1Txt>#</BusinessNameLine1Txt>
    <BusinessNameLine2Txt>#</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
    <AddressLine1Txt>0</AddressLine1Txt>
    <AddressLine2Txt>0</AddressLine2Txt>
    <CityNm>A</CityNm>
    <StateAbbreviationCd>PW</StateAbbreviationCd>
    <ZIPCd>00000</ZIPCd>
</EmployerUSAddress>

    <EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
    <EmployersUseCd>A</EmployersUseCd>
    <PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
    <EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>

a)使用上述代码来获取Xpath(文本)值的lxml方法是什么?

a) What is the lxml method to use , to get value, Xpath (text) using above code?

b)用于重复组节点聚合的lxml方法是什么?

b) What is the lxml method to use, to get repeating group node aggration?

像EmployersUseGrp的Xpath ====>5

like Xpath of EmployersUseGrp ====> 5

编辑===== 2019年6月26日======================

我无法打开新问题.我收到超出问题限制的消息.我将在此处发布此代码的后续内容.

I am not able to open new questions. I am getting question limit exceeded message. I am posting the follow up to this code here.

我正在尝试使用发布的pyhton代码答案.我得到的输出很奇怪.

I am trying to use the posted pyhton code answer. I am getting weird output.

我有一个很大的XML文件,例如(inputf.xml).我将此文件用作已发布代码中的input = inputf.xml

I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in posted code




    <?xml version="1.0" encoding="UTF-8"?>
      <DataFileFor>
        <DataR>
           <Id>5070022019330a0050hq</Id>
             <NUM>30221730001019</NUM>
             <Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
             <TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>

++++

使用xml_grep抓取Node的XPATH时,我得到了.

When grab the XPATH of Node using xml_grep, I am getting.

xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===>输出

xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output


<?xml version="1.0" ?>

<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">

<file filename="inputf.xml">

  <W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">

    <CorrectedW2Ind>X</CorrectedW2Ind>

    <EmployeeSSN>000000000</EmployeeSSN>

    <EmployerEIN>000000000</EmployerEIN>

    <EmployerNameControlTxt>S</EmployerNameControlTxt>

    <EmployerName>

      <BusinessNameLine1Txt>String</BusinessNameLine1Txt>

      <BusinessNameLine2Txt>String</BusinessNameLine2Txt>

    </EmployerName>

    <EmployerUSAddress>

      <AddressLine1Txt>String</AddressLine1Txt>

      <AddressLine2Txt>String</AddressLine2Txt>

      <CityNm>String</CityNm>

      <StateAbbreviationCd>AL</StateAbbreviationCd>

      <ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>

当我使用此代码时,它不会产生可读的Xpath.输出XPATHS就像

When I use this code, it is not producing readable Xpaths. The output XPATHS are like


/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[10]|X
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[11]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[12]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[13]|S
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[14]|String

属性

Id ="W2";dName ="W2".sId ="00000000"sVersionNum ="String">没有出现在输出中

Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output

要解决此问题,需要对代码进行哪些更改?

What are the changes required to the code, to fix this?

感谢您的指导.

推荐答案

刚才看到的是,我写了一些在python中完成此操作的东西-输出到csv,以管道分隔.随意使用它.很高兴回答任何问题,但不要期望立即得到答复.

Just seen this, i wrote something that did this in python - outputs to csv, pipe delimited. Feel free to use it. Happy to answer any questions but don't expect immediate response.

from lxml import etree, objectify

def parseXML(xmlFile, outputFile):
    """
    Parse the XML function
    """
    with open(xmlFile) as fobj:
        xml = fobj.read()

    f = open(outputFile,'w') #open write to file
    root = etree.fromstring(xml)

    f.write("%s|%s\n" %("Field", "Value"))
    tree = etree.ElementTree(root)
    for e in root.iter():
        f.write("%s|%s\n" %(tree.getpath(e), e.text))

    f.close()

if __name__ == "__main__":
    print ('Loading variables...')
    input = '16a.xml'
    output = input + '.csv'

    parseXML(input,output)

这篇关于我该如何浏览&amp;列出XML消息的XPATH?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆