使用XPath获取HTML元素的文本内容？ [英] Get text content of an HTML element using XPath?

查看：92 发布时间：2018/6/15 12:37:49 html xml xpath html-parsing

本文介绍了使用XPath获取HTML元素的文本内容？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

查看此HTML

<div>
    <p>
    <span class="abc">Monitor</span> <b>$300</b>
    </p>
    <a href="/add">Add to cart</a>
</div>
<div>
    <p>
    <span class="abc">Keyboard</span> $20 
    </p>
    <a href="/add">Add to cart</a>
</div>

使用xpath我想解析 Monitor $ 300 和键盘$ 20 。我使用这个xpath

Using xpath I want to parse Monitor $300 and Keyboard $20. I use this xpath

 //div[a[contains(., "Add to cart")]]/p/text()

但它会选择< span class =abc>监测< /跨度> < b取代; $ 300℃/ B个。我不想要标签。如何获取文本？

But it selects <span class="abc">Monitor</span> <b>$300</b>. I don't want the tags. How do I get only the text?

推荐答案

您希望选择所有后代文本，而不仅仅是子文本：

You want to select all descendant text, not just child text:

//div[a[contains(., "Add to cart")]]/p//text()

请注意 p 和 text（） there。

Note the double slash between p and text() there.

这可能也会包含很多inter-tag空格，我需要清理它。使用 lxml 的示例：

This potentially will also include a lot of inter-tag whitespace though, you you'll need to clean that up. Example using lxml:

>>> import lxml.etree as ET >>> tree = ET.fromstring('''<div> ... <div> ... <p> ... <span class="abc">Monitor</span> <b>$300</b> ... </p> ... <a href="/add">Add to cart</a> ... </div> ... <div> ... <p> ... <span class="abc">Keyboard</span> $20 ... </p> ... <a href="/add">Add to cart</a> ... </div> ... </div>''') >>> tree.xpath('//div[a[contains(., "Add to cart")]]/p//text()') ['\n ', 'Monitor', ' ', '$300', '\n ', '\n ', 'Keyboard', ' $20 \n '] >>> res = _ >>> [txt for txt in (txt.strip() for txt in res) if txt] ['Monitor', '$300', 'Keyboard', '$20']

这篇关于使用XPath获取HTML元素的文本内容？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用XPath获取HTML元素的文本内容？ [英] Get text content of an HTML element using XPath?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用XPath获取HTML元素的文本内容？ [英] Get text content of an HTML element using XPath?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭