解析 updateinfo.xml [英] Parse updateinfo.xml

查看:19
本文介绍了解析 updateinfo.xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试用 Python 为我的大学项目解析 Amazon updateinfo.xml 文件.示例文件如下:

I have been trying to parse the Amazon updateinfo.xml file for my university project in Python. An example file is as follows:

<?xml version="1.0" ?>
<updates>
<update author="linux-security@amazon.com" from="linux-security@amazon.com" status="final" type="security" version="1.4">
<id>AL2012-2014-001</id>
<title>Amazon Linux 2012.03 - AL2012-2014-001: important priority package update for libxml2</title>
<issued date="2014-10-19 15:48" />
<updated date="2014-10-19 15:48" />
<severity>important</severity>
<description>Package updates are available for Amazon Linux that fix the following vulnerabilities:
CVE-2012-5134:
	A heap-based buffer underflow flaw was found in the way libxml2 decoded certain entities. A remote attacker could provide a specially-crafted XML file that, when opened in an application linked against libxml2, would cause the application to crash or, potentially, execute arbitrary code with the privileges of the user running the application.
</description>
<references>
<reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-5134" id="CVE-2012-5134" title="" type="cve" />
<reference href="https://rhn.redhat.com/errata/RHSA-2012:1512.html" id="RHSA-2012:1512" title="" type="redhat" />
</references>
<pkglist>
<collection short="amazon-linux">
<name>Amazon Linux</name>
<package arch="x86_64" epoch="0" name="libxml2-debuginfo" release="10.23.26.ec2" version="2.7.8">
<filename>Packages/libxml2-debuginfo-2.7.8-10.23.26.ec2.x86_64.rpm</filename>
</package>
<package arch="x86_64" epoch="0" name="libxml2-devel" release="10.23.26.ec2" version="2.7.8">
<filename>Packages/libxml2-devel-2.7.8-10.23.26.ec2.x86_64.rpm</filename>
</package>
<package arch="x86_64" epoch="0" name="libxml2" release="10.23.26.ec2" version="2.7.8">
<filename>Packages/libxml2-2.7.8-10.23.26.ec2.x86_64.rpm</filename>
</package>
<package arch="x86_64" epoch="0" name="libxml2-static" release="10.23.26.ec2" version="2.7.8">
<filename>Packages/libxml2-static-2.7.8-10.23.26.ec2.x86_64.rpm</filename>
</package>
<package arch="x86_64" epoch="0" name="libxml2-python" release="10.23.26.ec2" version="2.7.8">
<filename>Packages/libxml2-python-2.7.8-10.23.26.ec2.x86_64.rpm</filename>
</package>
</collection>
</pkglist>
</update>
<update author="linux-security@amazon.com" from="linux-security@amazon.com" status="final" type="security" version="1.4">
<id>AL2012-2015-088</id>
<title>Amazon Linux 2012.03 - AL2012-2015-088: medium priority package update for gnutls</title>
<issued date="2015-07-29 20:47" />
<updated date="2015-07-29 20:47" />
<severity>medium</severity>
<description>Package updates are available for Amazon Linux that fix the following vulnerabilities:
CVE-2015-0294:
	It was discovered that GnuTLS did not check if all sections of X.509 certificates indicate the same signature algorithm. This flaw, in combination with a different flaw, could possibly lead to a bypass of the certificate signature check.

CVE-2015-0282:
	It was found that GnuTLS did not verify whether a hashing algorithm listed in a signature matched the hashing algorithm listed in the certificate. An attacker could create a certificate that used a different hashing algorithm than it claimed, possibly causing GnuTLS to use an insecure, disallowed hashing algorithm during certificate verification.

CVE-2014-8155:
	It was found that GnuTLS did not check activation and expiration dates of CA certificates. This could cause an application using GnuTLS to incorrectly accept a certificate as valid when its issuing CA is already expired.
</description>
<references>
<reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-8155" id="CVE-2014-8155" title="" type="cve" />
<reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0282" id="CVE-2015-0282" title="" type="cve" />
<reference href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0294" id="CVE-2015-0294" title="" type="cve" />
<reference href="https://rhn.redhat.com/errata/RHSA-2015:1457.html" id="RHSA-2015:1457" title="" type="redhat" />
</references>
<pkglist>
<collection short="amazon-linux">
<name>Amazon Linux</name>
<package arch="x86_64" epoch="0" name="gnutls-debuginfo" release="18.14.al12" version="2.8.5">
<filename>Packages/gnutls-debuginfo-2.8.5-18.14.al12.x86_64.rpm</filename></package>
<package arch="x86_64" epoch="0" name="gnutls" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-2.8.5-18.14.al12.x86_64.rpm</filename></package>
<package arch="x86_64" epoch="0" name="gnutls-devel" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-devel-2.8.5-18.14.al12.x86_64.rpm</filename></package>
<package arch="x86_64" epoch="0" name="gnutls-utils" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-utils-2.8.5-18.14.al12.x86_64.rpm</filename></package>
<package arch="x86_64" epoch="0" name="gnutls-guile" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-guile-2.8.5-18.14.al12.x86_64.rpm</filename></package>
<package arch="i686" epoch="0" name="gnutls-debuginfo" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-debuginfo-2.8.5-18.14.al12.i686.rpm</filename></package>
<package arch="i686" epoch="0" name="gnutls-devel" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-devel-2.8.5-18.14.al12.i686.rpm</filename></package>
<package arch="i686" epoch="0" name="gnutls-guile" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-guile-2.8.5-18.14.al12.i686.rpm</filename></package>
<package arch="i686" epoch="0" name="gnutls" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-2.8.5-18.14.al12.i686.rpm</filename></package>
<package arch="i686" epoch="0" name="gnutls-utils" release="18.14.al12" version="2.8.5"><filename>Packages/gnutls-utils-2.8.5-18.14.al12.i686.rpm</filename></package>
</collection>
</pkglist>
</update>
</updates>

我正在尝试去除一些细节,例如架构类型、名称、发布版本和没有包的文件名.

I am trying to wean out details such as the arch type, name, its release version and the file name without packages.

我的问题是,如何有效地对包含上述 300 个条目的文件执行此操作?由于我对 Python 的了解有限,我可以设法从一个条目中解决这个问题.但是有这么多(700+)个条目(1.5G 文件大小),当我尝试在 for 循环中运行它时,它会消耗大量资源并且包含乱码.我该怎么做?

My question is, how do I do this to a file with some 300 of the above entries efficiently? With my limited knowledge about Python, I can manage to get this out from a single entry. But with so many (700+) entries (1.5G file size), when I try to run it in a for loop, it consumes a lot of resources and the contains garble. How do I do this?

推荐答案

使用 xml.etree 模块.就我的经验而言,使用 xml.etree 时性能很好.

Use xml.etree module. As far as my experience was when working with xml.etree the performance is good.

例如:

import xml.etree.ElementTree as ET
tree = ET.parse('updateinfo.xml')
root = tree.getroot()
updates = root.findall('update')

for update in updates:
  packages=update.find('pkglist').find('collection').findall('package')
  for package in packages:
    print(package.attrib['arch'], package.attrib['name'], package.attrib['release'], package.find('filename').text.replace('Packages/',''))

这会产生以下输出(使用 python3 运行):

This results in the following output (ran with python3):

x86_64 libxml2-debuginfo 10.23.26.ec2 libxml2-debuginfo-2.7.8-10.23.26.ec2.x86_64.rpm
x86_64 libxml2-devel 10.23.26.ec2 libxml2-devel-2.7.8-10.23.26.ec2.x86_64.rpm
x86_64 libxml2 10.23.26.ec2 libxml2-2.7.8-10.23.26.ec2.x86_64.rpm
x86_64 libxml2-static 10.23.26.ec2 libxml2-static-2.7.8-10.23.26.ec2.x86_64.rpm
x86_64 libxml2-python 10.23.26.ec2 libxml2-python-2.7.8-10.23.26.ec2.x86_64.rpm
x86_64 gnutls-debuginfo 18.14.al12 gnutls-debuginfo-2.8.5-18.14.al12.x86_64.rpm
x86_64 gnutls 18.14.al12 gnutls-2.8.5-18.14.al12.x86_64.rpm
x86_64 gnutls-devel 18.14.al12 gnutls-devel-2.8.5-18.14.al12.x86_64.rpm
x86_64 gnutls-utils 18.14.al12 gnutls-utils-2.8.5-18.14.al12.x86_64.rpm
x86_64 gnutls-guile 18.14.al12 gnutls-guile-2.8.5-18.14.al12.x86_64.rpm
i686 gnutls-debuginfo 18.14.al12 gnutls-debuginfo-2.8.5-18.14.al12.i686.rpm
i686 gnutls-devel 18.14.al12 gnutls-devel-2.8.5-18.14.al12.i686.rpm
i686 gnutls-guile 18.14.al12 gnutls-guile-2.8.5-18.14.al12.i686.rpm
i686 gnutls 18.14.al12 gnutls-2.8.5-18.14.al12.i686.rpm
i686 gnutls-utils 18.14.al12 gnutls-utils-2.8.5-18.14.al12.i686.rpm

这篇关于解析 updateinfo.xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆