使用lxml在python中解析多个名称空间XML [英] Parsing multiple namespaces XML in python using lxml

查看：47 发布时间：2021/5/30 21:52:18 python xml parsing namespaces lxml

本文介绍了使用lxml在python中解析多个名称空间XML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

<?xml-stylesheet href="/Style Library/st/xslt/rss2.xsl" type="text/xsl" media="screen" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:ta="http://www.smartraveller.gov.au/schema/rss/travel_advisories/" xmlns:dc="http://purl.org/dc/elements/1.1/">

  <channel>
    <title>Travel Advisories</title>
    <link>http://smartraveller.gov.au/countries/</link>
    <description>the Australian Department of Foreign Affairs and Trade's Smartraveller advisory service</description>
    <language>en</language>
    <webMaster>webmaster@dfat.gov.au</webMaster>
    <copyright>Copyright Commonwealth of Australia 2011</copyright>
    <ttl>60</ttl>
    <atom:link href="http://smartraveller.gov.au/countries/Documents/index.rss" rel="self" type="application/rss+xml" />
    <generator>zcms</generator>
    <image>
      <title>Advice</title>
      <link>http://smartraveller.gov.au/countries/</link>
      <url>/Style Library/st/images/dfat_logo_small.gif</url>
    </image>
    <item>
      <title>Czech Republic</title>
      <description>ThisÂ travel advice has been reviewed.Â The level of ourÂ advice has not changed. Exercise normal safety precautions in the Czech Republic.</description>
      <link>http://smartraveller.gov.au/Countries/europe/eastern/Pages/czech_republic.aspx</link>
      <pubDate>26 Oct 2018 05:25:14 GMT</pubDate>
      <guid isPermaLink="false">cdbcc3d4-3a89-4768-ac1d-0221f8c99227 GMT</guid>
      <ta:warnings>
        <dc:coverage>Czech Republic</dc:coverage>
        <ta:level>2/5</ta:level>
        <dc:description>Exercise normal safety precautions</dc:description>
      </ta:warnings>
  </item>

我想为我拥有的每个项目提取< warning> 下< ta:level> 的值.我曾经尝试过现有的在线解决方案，但对我来说没有任何用处.基本上，我的xml包含多个名称空间.

I want to extract the value of <ta:level> under <warning> for each item I have. I alreay tried existing online solutions but nothing works for me. Basically, my xml contains multiple namespaces.

req = requests.request('GET', "https://smartraveller.gov.au/countries/documents/index.rss")
a = str(req.text).encode()
tree = etree.fromstring(a)

ns = {'TravelAd': 'https://smartraveller.gov.au/countries/documents/index.rss',
          'ta': 'http://www.smartraveller.gov.au/schema/rss/travel_advisories/'}

    e = tree.findall('{0}channel/{0}item/{0}warnings/{0}level'.format(ns))
    for i in e:
        print(i.text)

推荐答案

XML具有多个名称空间，但是您唯一需要担心的名称空间是 http://www.smartraveller.gov.au/schema/rss/travel_advisories/.

The XML has multiple namespaces, but the only namespace you need to worry about is http://www.smartraveller.gov.au/schema/rss/travel_advisories/.

这是因为目标路径中位于名称空间中的唯一元素是 ta:level 和 ta:warning .

This is because the only elements in the path to your target that are in a namespace are ta:level and ta:warning.

示例...

from lxml import etree
import requests

req = requests.request('GET', "https://smartraveller.gov.au/countries/documents/index.rss")
a = str(req.text).encode()

tree = etree.fromstring(a)

ns = {'ta': 'http://www.smartraveller.gov.au/schema/rss/travel_advisories/'}

e = tree.findall('channel/item/ta:warnings/ta:level', ns)
for i in e:
    print(i.text)

打印...

2/5
2/5
4/5
2/5
...and so on

如果需要列表，请考虑从 findall()切换到 xpath() ...

If you wanted a list, consider switching from findall() to xpath()...

e = tree.xpath('channel/item/ta:warnings/ta:level/text()', namespaces=ns)
print(e)

打印...

['2/5', '2/5', '4/5', '2/5', and so on...]

这篇关于使用lxml在python中解析多个名称空间XML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用lxml在python中解析多个名称空间XML [英] Parsing multiple namespaces XML in python using lxml

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用lxml在python中解析多个名称空间XML [英] Parsing multiple namespaces XML in python using lxml

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭