如何通过搜索纯文本在 HTML 中找到带有标记文本的节点? [英] How can I find a node in HTML which has marked-up text by searching for the plaintext?

查看:32
本文介绍了如何通过搜索纯文本在 HTML 中找到带有标记文本的节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 xpath 或 javascript(节点)库在 HTML 文档中找到最接近"的节点,该文档在其内部文本中包含特定字符串;在像

I'm trying to find the "closest" nodes in a HTML document that holds in its innertext a specific string using xpath or javascript (node) libraries; in a HTML snippet like

<p>Lorem ipsum dolor sit <strong>amet, <em>{cons</em>ectetur} adipiscing elit.</strong> Morbi rhoncus lacinia orci a dapibus. Nulla facilisi. Sed id nibh ornare, aliquet ante nec, efficitur leo. Sed viverra ex turpis,</p>

如果我要查找与 {cons.*tur} 匹配的单词,那么在这种情况下 {consectetur},我想找到 <strong> 节点而不是

节点,因为这是拥有它的最小节点.

if I'm looking for words that match {cons.*tur}, so in this case {consectetur}, I want to find the <strong> node rather than the <p> node because that's the smallest node that has it.

这里的strong只是一个例子,它可以是任何tagName,也可以是深度嵌套的;我正在寻找的词也可以像上面的例子一样分布在两个以上的嵌套级别上.

edit: the strong here is just an example, it could be any tagName, and it could be deeply nested; the word I'm looking for could also be spread out over more than two nesting levels as in the example above.

更多我实际上是在寻找一种模式,所以 //div[contains(., 'consectetur')] 不起作用.

more edit: I'm actually looking for a pattern, so an //div[contains(., 'consectetur')] wouldn't work.

推荐答案

如果我正在寻找与 {cons.*tur} 匹配的单词,那么在这种情况下{consectetur},我想找到节点而不是

节点,因为这是拥有它的最小节点.

if I'm looking for words that match {cons.*tur}, so in this case {consectetur}, I want to find the <strong> node rather than the <p> node because that's the smallest node that has it.

您需要一个表达式来选择具有与您的模式匹配的字符串值的最后一个后代元素.所以

You need an expression that select the last descendant element that has a string value matching your pattern. So

/descendant::*[contains(.,'{consectetur}')][last()]

如果要选择多个此元素(不同的分支),那么您将需要一个表达式选择和元素匹配您的模式,没有后代也匹配它.

If there would be more than one of this element (different branch) to select, then you will need an expression selecting and element matching your pattern with no descendant also matching it.

//*[contains(.,'{consectetur}') and not(.//*[contains(.,'{consectetur}'])]

关于当时的模式

如果你想使用正则表达式,你至少需要 XPath 2.0 函数.您当前的模式 {cons.*tur} 在 XPath 1.0 中与

About then pattern

If you want to use regular expression you need at least XPath 2.0 functions. Your current pattern {cons.*tur} in XPath 1.0 is the same as

contains(substring-after(.,'{cons'),'tur}')

这篇关于如何通过搜索纯文本在 HTML 中找到带有标记文本的节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆