lxml中的解析功能出错 [英] error with parse function in lxml

查看:232
本文介绍了lxml中的解析功能出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在Windows平台上安装了lxml2.2.2(我使用python 2.6.5版).我尝试了以下简单命令:

i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).i tried this simple command:

from lxml.html import parse 
p= parse(‘http://www.google.com’).getroot()

但是我遇到以下错误:

Traceback (most recent call last):
File "", line 1, in p=parse(‘http://www.google.com’).getroot()
File "C:\Python26\lib\site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\html_init_.py", line 661, in parse return etree.parse(filenameorurl, parser, baseurl=baseurl, **kw) 
File "lxml.etree.pyx", line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590) 
File "parser.pxi", line 1491, in lxml.etree.parseDocument (src/lxml/lxml.etree.c:71205) File "parser.pxi", line 1520, in lxml.etree.parseDocumentFromURL (src/lxml/lxml.etree.c:71488) 
File "parser.pxi", line 1420, in lxml.etree.parseDocFromFile (src/lxml/lxml.etree.c:70583)
File "parser.pxi", line 975, in lxml.etree.BaseParser.parseDocFrom
File (src/lxml/lxml.etree.c:67736)
File "parser.pxi", line 539, in lxml.etree.ParserContext.handleParseResultDoc (src/lxml/lxml.etree.c:63820) 
File "parser.pxi", line 625, in lxml.etree.handleParseResult (src/lxml/lxml.etree.c:64741)
File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file ‘http://www.google.com’: failed to load external entity "http://www.google.com"

由于我是python的新手,所以我对下一步的工作一无所知.请指导我解决此错误.提前致谢!! :)

i am clueless as to what to do next as i am a newbie to python. please guide me to solve this error. thanks in advance!! :)

推荐答案

lxml.html.parse不获取URL.

lxml.html.parse does not fetch URLs.

以下是使用urllib2的方法:

Here's how to do it with urllib2:

>>> from urllib2 import urlopen
>>> from lxml.html import parse
>>> page = urlopen('http://www.google.com')
>>> p = parse(page)
>>> p.getroot()
<Element html at 1304050>


更新
史蒂文是对的. lxml.etree.parse应该接受并加载URL.我错过了.我尝试删除此答案,但不允许这样做.


Update
Steven is right. lxml.etree.parse should accept and load URLs. I missed that. I've tried deleting this answer, but I'm not allowed.

我撤回了有关不获取URL的声明.

I retract my statement about it not fetching URLs.

这篇关于lxml中的解析功能出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆