我该如何浏览&列出XML消息的XPATH? [英] How can I browse & list XPATH of a XML Message?
问题描述
**** 请参阅下面的编辑"部分:
感谢您调查此问题.我不确定这是否是发布此主题的合适论坛.如果没有,请让我知道发布此主题的正确论坛.
Thanks for looking into this issue. I am not sure, whether this is the right forum to post this thread. If not, let me know the right forum to post this thread.
我们有一个复杂的XML消息(XML格式的数据).我们正在探索一种提取此XML消息及其元素/属性级别数据内容的所有XPATH的方法.我们尝试使用XMLSPY,&xmltwig,但是没有运气.如果我们提供XPATH输入,则Xml_grep将提取数据.xml_grep中没有选项来浏览XML消息的所有XPATH.
We have a complex XML Message (data in XML format). We are exploring a way to extract all the XPATHs of this XML message and its element/attribute level data content. We tried with XMLSPY, & xmltwig, but no luck. Xml_grep pulls data, if we give XPATH input. There is no option in xml_grep to browse all XPATHS of a XML message.
我有格式正确的XML消息.我想将列表/报告生成为
I have well-formed XML message. I want to produce a list/report as
-
XML消息的所有Xpath (浏览所有XPATH和XML消息列表)
Xpath,此XPATH 的数据内容(浏览所有XPATH,数据内容并列出XML消息)
Xpath , data content for this XPATH (Browse all XPATH, data content and list both of XML message)
这里是一个示例(输入XML消息)
Here is an example (Input XML Message)
<?xml version="1.0"?>
<PARTS>
<TITLE>Computer Parts</TITLE>
<PART>
<ITEM>Motherboard</ITEM>
<MANUFACTURER>ASUS</MANUFACTURER>
<MODEL>P3B-F</MODEL>
<COST> 123.00</COST>
</PART>
<PART>
<ITEM>Video Card</ITEM>
<MANUFACTURER>ATI</MANUFACTURER>
<MODEL>All-in-Wonder Pro</MODEL>
<COST> 160.00</COST>
</PART>
<PART>
<ITEM>Sound Card</ITEM>
<MANUFACTURER>Creative Labs</MANUFACTURER>
<MODEL>Sound Blaster Live</MODEL>
<COST> 80.00</COST>
</PART>
<PART>
<ITEM>inch Monitor</ITEM>
<MANUFACTURER>LG Electronics</MANUFACTURER>
<MODEL> 995E</MODEL>
<COST> 290.00</COST>
</PART>
</PARTS>
所需的输出->我手动创建了以下XML列表
The desired output --> I created the following XML list manually
/PARTS/TITLE Computer Parts
/PARTS/PART[1]/ITEM Motherboard
/PARTS/PART[1]/MANUFACTURER ASUS
/PARTS/PART[1]/MODEL P3B-F
/PARTS/PART[1]/COST 123.00
/PARTS/PART[2]/ITEM Video Card
/PARTS/PART[2]/MANUFACTURER ATI
............
..............
..................
...................
是否有任何开源产品可为XML Message生成此类报告?
提取XPATH/XPATH数据内容的方法是什么?
感谢允许挑剔这个论坛的人.
Thanks for allowing to pick the brain of this forum.
+++++
谢谢.上面的代码输出
Field|Value
/*|
/*/*[1]|X
/*/*[2]|000000000
/*/*[3]|000000000
/*/*[4]|&
/*/*[5]|
我无法获取文本xpath
I am not able to get text xpath
这是输入xml
<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>&</EmployerNameControlTxt>
<EmployerName>
<BusinessNameLine1Txt>#</BusinessNameLine1Txt>
<BusinessNameLine2Txt>#</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
<AddressLine1Txt>0</AddressLine1Txt>
<AddressLine2Txt>0</AddressLine2Txt>
<CityNm>A</CityNm>
<StateAbbreviationCd>PW</StateAbbreviationCd>
<ZIPCd>00000</ZIPCd>
</EmployerUSAddress>
<EmployersUseGrp>
<EmployersUseCd>A</EmployersUseCd>
<PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
<EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
<EmployersUseCd>A</EmployersUseCd>
<PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
<EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
<EmployersUseCd>A</EmployersUseCd>
<PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
<EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
<EmployersUseCd>A</EmployersUseCd>
<PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
<EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
<EmployersUseGrp>
<EmployersUseCd>A</EmployersUseCd>
<PriorUSERRAContributionYr>00</PriorUSERRAContributionYr>
<EmployersUseAmt>0</EmployersUseAmt>
</EmployersUseGrp>
a)使用上述代码来获取Xpath(文本)值的lxml方法是什么?
a) What is the lxml method to use , to get value, Xpath (text) using above code?
b)用于重复组节点聚合的lxml方法是什么?
b) What is the lxml method to use, to get repeating group node aggration?
像EmployersUseGrp的Xpath ====>5
like Xpath of EmployersUseGrp ====> 5
编辑===== 2019年6月26日======================
我无法打开新问题.我收到超出问题限制的消息.我将在此处发布此代码的后续内容.
I am not able to open new questions. I am getting question limit exceeded message. I am posting the follow up to this code here.
我正在尝试使用发布的pyhton代码答案.我得到的输出很奇怪.
I am trying to use the posted pyhton code answer. I am getting weird output.
我有一个很大的XML文件,例如(inputf.xml).我将此文件用作已发布代码中的input = inputf.xml
I have a large XML file like (inputf.xml). I used this file as input = inputf.xml in posted code
<?xml version="1.0" encoding="UTF-8"?>
<DataFileFor>
<DataR>
<Id>5070022019330a0050hq</Id>
<NUM>30221730001019</NUM>
<Postmark>2020-01-03T09:25:57.000-05:00</Postmark>
<TNO>47647</TNO>
.
.
.
.
.
</DataFileFor>
++++
使用xml_grep抓取Node的XPATH时,我得到了.
When grab the XPATH of Node using xml_grep, I am getting.
xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===>输出
xml_grep DataFileFor/DataR/Ret/W2 inputf.xml ===> output
<?xml version="1.0" ?>
<xml_grep version="0.7" date="Fri Jun 26 13:07:11 2020">
<file filename="inputf.xml">
<W2 Id="W2" dName="W2" sId="00000000" sVersionNum="String">
<CorrectedW2Ind>X</CorrectedW2Ind>
<EmployeeSSN>000000000</EmployeeSSN>
<EmployerEIN>000000000</EmployerEIN>
<EmployerNameControlTxt>S</EmployerNameControlTxt>
<EmployerName>
<BusinessNameLine1Txt>String</BusinessNameLine1Txt>
<BusinessNameLine2Txt>String</BusinessNameLine2Txt>
</EmployerName>
<EmployerUSAddress>
<AddressLine1Txt>String</AddressLine1Txt>
<AddressLine2Txt>String</AddressLine2Txt>
<CityNm>String</CityNm>
<StateAbbreviationCd>AL</StateAbbreviationCd>
<ZIPCd>000000000</ZIPCd>
.
.
.
.
.
</W2>
当我使用此代码时,它不会产生可读的Xpath.输出XPATHS就像
When I use this code, it is not producing readable Xpaths. The output XPATHS are like
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[10]|X
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[11]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[12]|00000000
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[13]|S
/DataFileFor/DataR/*[8]/*[2]/*[6]/*[3]/*[14]|String
属性
Id ="W2";dName ="W2".sId ="00000000"sVersionNum ="String">没有出现在输出中
Id="W2" dName="W2" sId="00000000" sVersionNum="String"> are not showing up in the output
要解决此问题,需要对代码进行哪些更改?
What are the changes required to the code, to fix this?
感谢您的指导.
推荐答案
刚才看到的是,我写了一些在python中完成此操作的东西-输出到csv,以管道分隔.随意使用它.很高兴回答任何问题,但不要期望立即得到答复.
Just seen this, i wrote something that did this in python - outputs to csv, pipe delimited. Feel free to use it. Happy to answer any questions but don't expect immediate response.
from lxml import etree, objectify
def parseXML(xmlFile, outputFile):
"""
Parse the XML function
"""
with open(xmlFile) as fobj:
xml = fobj.read()
f = open(outputFile,'w') #open write to file
root = etree.fromstring(xml)
f.write("%s|%s\n" %("Field", "Value"))
tree = etree.ElementTree(root)
for e in root.iter():
f.write("%s|%s\n" %(tree.getpath(e), e.text))
f.close()
if __name__ == "__main__":
print ('Loading variables...')
input = '16a.xml'
output = input + '.csv'
parseXML(input,output)
这篇关于我该如何浏览&列出XML消息的XPATH?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!