Python lxml XPath问题 [英] Python lxml XPath problem

查看:68
本文介绍了Python lxml XPath问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网页中打印/保存特定元素的HTML.
我已经从萤火虫中检索了所请求元素的XPath.

I'm trying to print/save a certain element's HTML from a web-page.
I've retrieved the requested element's XPath from firebug.

我只希望将此元素保存到文件中. 我似乎没有成功.
(尝试在XPath的最后加上和不加上/text())

All I wish is to save this element to a file. I don't seem to succeed in doing so.
(tried the XPath with and without a /text() at the end)

我将不胜感激,也欢迎您获得以往的经验.
10倍,大卫

I would appreciate any help, or past experience.
10x, David

import urllib2,StringIO
from lxml import etree

url='http://www.tutiempo.net/en/Climate/Londres_Heathrow_Airport/12-2009/37720.htm'
seite = urllib2.urlopen(url)
html = seite.read()
seite.close()
parser = etree.HTMLParser()
tree = etree.parse(StringIO.StringIO(html), parser)
xpath = "/html/body/table/tbody/tr/td[2]/div/table/tbody/tr[6]/td/table/tbody/tr/td[3]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr/td/table/tbody/text()"
elem = tree.xpath(xpath)


print elem[0].strip().encode("utf-8")

推荐答案

您的XPath显然太长了,为什么不尝试使用较短的XPath并查看它们是否匹配.一个问题可能是"tbody",浏览器会在DOM中自动创建"tbody",但HTML标记通常不包含它.

Your XPath is obviously a bit too long, why don't you try shorter ones and see if they match. One problem might be "tbody" which gets automatically created in the DOM by browsers but the HTML markup usually does not contain it.

以下是如何使用XPath结果的示例:

Here's an example of how to use XPath results:

>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = etree.parse(StringIO("<html><body>a<something/>b</body></root>"), etree.HTMLParser())
>>> doc.xpath("/html/body/text()")
['a', 'b']

因此,您可以根据需要将所有文本部分一起"".join(...).

So you could just "".join(...) all text parts together if needed.

这篇关于Python lxml XPath问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆