使用Python将XML转换为标记和值列表 [英] Convert XML into Lists of Tags and Values with Python
问题描述
我正在学习Python,并且正在尝试从任何XML文件中提取所有标记和相应值的列表.到目前为止,这是我的代码.
I'm learning Python and I'm trying to extract lists of all tags and corresponding values from any XML file. This is my code so far.
def ParseXml(XmlFile):
try:
parser = etree.XMLParser(remove_blank_text=True, compact=True)
tree = ET.parse(XmlFile, parser)
root = tree.getroot()
ListOfTags, ListOfValues, ListOfAttribs = [], [], []
for elem in root.iter('*'):
Tag = elem.tag
ListOfTags.append(Tag)
value = elem.text
if value is not None:
ListOfValues.append(value)
else:
ListOfValues.append('')
attrib = elem.attrib
if attrib:
ListOfAttribs.append([attrib])
else:
ListOfAttribs.append([])
print('%s File parsed successfully' % XmlFile)
return (ListOfTags, ListOfValues, ListOfAttribs)
except Exception as e:
print('Error while parsing XMLs : %s : %s' % (type(e), e))
return ([], [], [])
对于这样的XML输入:
For an XML input like this:
<?xml version="1.0" encoding="UTF-8"?>
<Application Version="2.01">
<UserAuthRequest>
<VendorApp>
<AppName>SING</AppName>
</VendorApp>
</UserAuthRequest>
<ApplicationRequest ID="12-123-AH">
<GUID>ABD45129-PD1212-121DFL</GUID>
<Type tc="200">Streaming</Type>
<File></File>
<FileExtension VendorCode="200">
<Result>
<ResultCode tc="1">Success</ResultCode>
</Result>
</FileExtension>
</ApplicationRequest>
</Application>
此输出是标签,值和属性的多个列表.一切正常.
This output is multiple lists of tags, values and attributes. This is working fine.
['Application', 'UserAuthRequest', 'VendorApp', 'AppName', 'ApplicationRequest', 'GUID', 'Type', 'File', 'FileExtension', 'Result', 'ResultCode']
['', '', '', 'SING', '', 'ABD45129-PD1212-121DFL', 'Streaming', '', '', '', 'Success']
[[{'Version': '2.01'}], [], [], [], [{'ID': '12-123-AH'}], [], [{'tc': '200'}], [], [{'VendorCode': '200'}], [], [{'tc': '1'}]]
但是我的问题是我需要包括父标签和子标签的标签.如下所示,是我要定位的实际输出:
But my problem is that i need the tags including the parent and child tags. Like below is actual output I'm targetting:
['Application', 'UserAuthRequest', 'UserAuthRequest.VendorApp', 'UserAuthRequest.VendorApp.AppName', 'ApplicationRequest', 'ApplicationRequest.GUID', 'ApplicationRequest.Type', 'ApplicationRequest.File', 'ApplicationRequest.File.FileExtension', 'ApplicationRequest.File.FileExtension.Result', 'ApplicationRequest.File.FileExtension.Result.ResultCode']
我该如何使用Python做到这一点?还是有其他替代方法可以做到这一点?
How do i make this happen with Python? or is there any other alternate way to do this?
推荐答案
这是仅使用
注释: 输出(我已使用 Python 2.7 和 Python 3.5 运行脚本): Output (I've run the script with Python 2.7 and Python 3.5):
这篇关于使用Python将XML转换为标记和值列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
parse_node
的ancestor_string
参数完成的,该参数是针对树中的每个节点计算的,并传递给其(直接)子节点main
和parse_xml
),其中一个函数仅调用另一个函数,只会添加无用的嵌套级别,但这是<我习惯的em>优良作法
parse_node
's ancestor_string
argument, which is computed for each node in the tree and passed to its (direct) childrenmain
and parse_xml
) where one just calls the other, only adds an useless level of nesting, but it's a good practice that I got used to
['Application', 'Application.UserAuthRequest', 'Application.UserAuthRequest.VendorApp', 'Application.UserAuthRequest.VendorApp.AppName', 'Application.ApplicationRequest', 'Application.ApplicationRequest.GUID', 'Application.ApplicationRequest.Type', 'Application.ApplicationRequest.File', 'Application.ApplicationRequest.FileExtension', 'Application.ApplicationRequest.FileExtension.Result', 'Application.ApplicationRequest.FileExtension.Result.ResultCode']
['', '', '', 'SING', '', 'ABD45129-PD1212-121DFL', 'Streaming', '', '', '', 'Success']
[{'Version': '2.01'}, {}, {}, {}, {'ID': '12-123-AH'}, {}, {'tc': '200'}, {}, {'VendorCode': '200'}, {}, {'tc': '1'}]