如何使用lxml从本地文件或URL解析xml? [英] How to parse xml from local file or url with lxml?
问题描述
我尝试使用lxml解析xml,但是我有一个问题:ValueError:无效的\ x转义这是我的代码:
I try to use lxml to parse xml, but I have a problem: ValueError: invalid \x escape Here is my code:
from lxml import etree
root=etree.fromstring('C:\Users\hptphuong\Desktop\xmltest.xml')
我是lxml的新手.请帮助我解决此问题.有我的xml内容
I'm a newbie on lxml. Please help me to fix this issue. There is my xml content
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon's Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology
conference, tempers fly as feathers get ruffled.</description>
</book>
<book id="bk107">
<author>Thurman, Paula</author>
<title>Splish Splash</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-11-02</publish_date>
<description>A deep sea diver finds true love twenty
thousand leagues beneath the sea.</description>
</book>
<book id="bk108">
<author>Knorr, Stefan</author>
<title>Creepy Crawlies</title>
<genre>Horror</genre>
<price>4.95</price>
<publish_date>2000-12-06</publish_date>
<description>An anthology of horror stories about roaches,
centipedes, scorpions and other insects.</description>
</book>
<book id="bk109">
<author>Kress, Peter</author>
<title>Paradox Lost</title>
<genre>Science Fiction</genre>
<price>6.95</price>
<publish_date>2000-11-02</publish_date>
<description>After an inadvertant trip through a Heisenberg
Uncertainty Device, James Salway discovers the problems
of being quantum.</description>
</book>
<book id="bk110">
<author>O'Brien, Tim</author>
<title>Microsoft .NET: The Programming Bible</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-09</publish_date>
<description>Microsoft's .NET initiative is explored in
detail in this deep programmer's reference.</description>
</book>
<book id="bk111">
<author>O'Brien, Tim</author>
<title>MSXML3: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>36.95</price>
<publish_date>2000-12-01</publish_date>
<description>The Microsoft MSXML3 parser is covered in
detail, with attention to XML DOM interfaces, XSLT processing,
SAX and more.</description>
</book>
<book id="bk112">
<author>Galos, Mike</author>
<title>Visual Studio 7: A Comprehensive Guide</title>
<genre>Computer</genre>
<price>49.95</price>
<publish_date>2001-04-16</publish_date>
<description>Microsoft Visual Studio 7 is explored in depth,
looking at how Visual Basic, Visual C++, C#, and ASP+ are
integrated into a comprehensive development
environment.</description>
</book>
</catalog>
还有一个,我们可以用lxml解析url中的xml吗?
One more, could we have parse xml from url with lxml.
感谢&最好的问候,
Thanks & Best Regards,
推荐答案
收到错误消息无效的\ x转义
的原因是您正在使用 etree.fromstring()
尝试从文件中加载XML.此函数用于直接从字符串中加载XML,并且您正在为其传递包含 \
的路径.
The reason you are getting the error message invalid \x escape
is that you are using etree.fromstring()
to attempt to load XML from a file. This function is used to load XML directly from a string, and you are passing it a path with \
in it.
实际上,该功能正在尝试将文件路径解析为XML.路径包含 \
转义字符,后跟无效字符(即 \ n
将是有效的换行符)
In effect the function is trying to parse your file path as XML. The path contains \
escape characters with invalid characters following them (i.e. \n
would be a valid newline)
要从文件加载XML,您需要使用 etree.parse()
函数,如下所示:
To load your XML from a file, you need to use the etree.parse()
function as follows:
from lxml import etree
root = etree.parse(r'C:\Users\hptphuong\Desktop\xmltest.xml')
# Print the loaded XML
print etree.tostring(root)
在将文件路径传递给Python函数时,通常应在字符串前加上 r
前缀,以告诉Python不要尝试转义路径内的 \
字符.例如, c:\ temp
实际上会导致传递 c:< tab字符> emp
,即 \ t
被转换为制表符特点.在开始时添加 r
即可停止这种情况.
When passing file paths to Python functions, you should normally prefix your string with r
to tell Python not to try and escape the \
characters inside your path. For example c:\temp
would actually result in passing c:<tab character>emp
, i.e. the \t
is converted into a tab character. Adding the r
to the start stops this happening.
或者,您也可以按以下方式传递路径:
Alternatively you can pass the path as follows:
path = "c:\\folder1\\folder2\\myfile.xml"
这篇关于如何使用lxml从本地文件或URL解析xml?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!