如何获得无效的HTML的xpath? [英] How to get xpath of invalid html?

查看:211
本文介绍了如何获得无效的HTML的xpath?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用xidel从html中提取xpath,最近遇到无效的html

我使用firefox来获取xpath,但firefox自动添加缺少的标签
所以xpath doesent匹配

我可以停止firefox吗,或者你可以提出一个处理这个问题的方法吗?

<还有一个反向xpath?获取某些文本的xpath?解决方案如果XML / HTML无效,Xidel将修复它,然后再应用XPath 。

虽然它可能修复它不同于Firefox。你可以看到它是如何改变的:

  xidel http:// yourwebpage -e / --html 

如果您保存该输出并在Firefox中打开它,则可以为此设置XPath。



一般来说,修复可能会改变中间标签,但可能会保持class和id不变。因此,您可以像 / html / body / div [2] / div [@ id =foo] / p [1] / p / text() // div [@ id =foo] / p [1] / span / text() // div [@ id = foo] // span [1] / text()


I am trying to extract xpath from html using xidel and recently encountered invalid html

i use firefox to get the xpath, but firefox automatically adds missing tags so the xpath doesent match

can i stop firefox, or can you suggest a way to deal with this?

also is there someway of a reverse xpath? to get the xpath of some text?

解决方案

If the XML/HTML is invalid, Xidel will repair it, before applying the XPath.

Although it might repair it differently than Firefox. You can see how it was changed with:

xidel http://yourwebpage -e / --html

If you save that output and open it in Firefox, you can make the XPath for that.

Generally, the repairing might change intermediate tags, but it will probably keep classes and ids unchanged. You can thus replace some XPath like /html/body/div[2]/div[@id="foo"]/p[1]/p/text() with //div[@id="foo"]/p[1]/span/text() or //div[@id="foo"]//span[1]/text()

这篇关于如何获得无效的HTML的xpath?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆