为什么lxml.html.parse()末尾的斜杠很重要? [英] Why is the slash at the end of lxml.html.parse() important?

查看：164 发布时间：2020/5/4 8:36:50 python lxml

本文介绍了为什么lxml.html.parse()末尾的斜杠很重要?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用lxml抓取html.该代码有效.

I am using lxml to scrape html. This code works.

lxml.html.parse( "http://google.com/" )

此代码没有.

lxml.html.parse( "http://google.com" )

为什么URL末尾的斜杠很重要?谢谢.

Why does the slash at the end of the URL matter? Thank you.

要清楚，这是python从后面的代码中给我的错误日志.

To be clear, here is the error log that python is giving me from the latter code.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/davidfaux/epd-7.2-2-rh5-x86/lib/python2.7/site-packages/lxml/html/__init__.py", line 692, in parse
    return etree.parse(filename_or_url, parser, base_url=base_url, **kw)
  File "lxml.etree.pyx", line 2953, in lxml.etree.parse (src/lxml/lxml.etree.c:56204)
  File "parser.pxi", line 1533, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:82287)
  File "parser.pxi", line 1562, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:82580)
  File "parser.pxi", line 1462, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:81619)
  File "parser.pxi", line 1002, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:78528)
  File "parser.pxi", line 569, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:74472)
  File "parser.pxi", line 650, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:75363)
  File "parser.pxi", line 588, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:74665)
IOError: Error reading file 'http://google.com': failed to load HTTP resource

为什么lxml.html.parse()末尾的斜杠很重要? [英] Why is the slash at the end of lxml.html.parse() important?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么lxml.html.parse()末尾的斜杠很重要? [英] Why is the slash at the end of lxml.html.parse() important?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭