如何使用lxml从本地文件或URL解析xml? [英] How to parse xml from local file or url with lxml?

查看:46
本文介绍了如何使用lxml从本地文件或URL解析xml?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用lxml解析xml,但是我有一个问题:ValueError:无效的\ x转义这是我的代码:

I try to use lxml to parse xml, but I have a problem: ValueError: invalid \x escape Here is my code:

from lxml import etree
root=etree.fromstring('C:\Users\hptphuong\Desktop\xmltest.xml')

我是lxml的新手.请帮助我解决此问题.有我的xml内容

I'm a newbie on lxml. Please help me to fix this issue. There is my xml content

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2000-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2000-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
   <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2000-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2001-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>

还有一个,我们可以用lxml解析url中的xml吗?

One more, could we have parse xml from url with lxml.

感谢&最好的问候,

Thanks & Best Regards,

推荐答案

收到错误消息无效的\ x转义的原因是您正在使用 etree.fromstring()尝试从文件中加载XML.此函数用于直接从字符串中加载XML,并且您正在为其传递包含 \ 的路径.

The reason you are getting the error message invalid \x escape is that you are using etree.fromstring() to attempt to load XML from a file. This function is used to load XML directly from a string, and you are passing it a path with \ in it.

实际上,该功能正在尝试将文件路径解析为XML.路径包含 \ 转义字符,后跟无效字符(即 \ n 将是有效的换行符)

In effect the function is trying to parse your file path as XML. The path contains \ escape characters with invalid characters following them (i.e. \n would be a valid newline)

要从文件加载XML,您需要使用 etree.parse()函数,如下所示:

To load your XML from a file, you need to use the etree.parse() function as follows:

from lxml import etree

root = etree.parse(r'C:\Users\hptphuong\Desktop\xmltest.xml')
# Print the loaded XML
print etree.tostring(root)

在将文件路径传递给Python函数时,通常应在字符串前加上 r 前缀,以告诉Python不要尝试转义路径内的 \ 字符.例如, c:\ temp 实际上会导致传递 c:< tab字符> emp ,即 \ t 被转换为制表符特点.在开始时添加 r 即可停止这种情况.

When passing file paths to Python functions, you should normally prefix your string with r to tell Python not to try and escape the \ characters inside your path. For example c:\temp would actually result in passing c:<tab character>emp, i.e. the \t is converted into a tab character. Adding the r to the start stops this happening.

或者,您也可以按以下方式传递路径:

Alternatively you can pass the path as follows:

path = "c:\\folder1\\folder2\\myfile.xml"

这篇关于如何使用lxml从本地文件或URL解析xml?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆