XPath:通过纯文本查找HTML元素 [英] XPath: Find HTML element by plain text

查看:470

您可以使用

  // * [contains(。,'This can not be found')] 
[not(.// * [contains(。,'This can not be found')])]

这个XPath由两部分:


  1. // * [contains(。,'This can not be found')] code>:运算符将上下文节点转换为其字符串表示形式。因此,该部分选择包含This can not be found的所有节点的字符串表示形式。在上例中,这是< someOtherElement> < yetAnotherElement> < body> < html>

  2. [not(.//* [contains(。,'This can not be found')])] :这将删除包含仍包含纯文本的子元素的节点。无法找到'。它在上面的例子中删除了不需要的节点< body> < html> 。 b $ b

您可以尝试使用这些XPath 这里


Please note: This question is a more refined version of a previous question.

I am looking for an XPath that lets me find elements with a given plain text in an HTML document. For example, suppose I have the following HTML:

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <yetAnotherElement>This can <em>not</em> be found</yetAnotherElement>
</body>
</html>

I need to search by text and am able to find <someElement> using the following XPath:

//*[contains(text(), 'This can be found')]

I am looking for a similar XPath that lets me find <someOtherElement> and <yetAnotherElement> using the plain text "This can not be found". The following does not work:

//*[contains(text(), 'This can not be found')]

I understand that this is because of the nested em element that "disrupts" the text flow of "This can not be found". Is it possible via XPaths to, in a way, ignore such or similar nestings as the one above?

解决方案

You can use

//*[contains(., 'This can not be found')]
   [not(.//*[contains(., 'This can not be found')])]

This XPath consists of two parts:

  1. //*[contains(., 'This can not be found')]: The operator . converts the context node to its string representation. This part therefore selects all nodes that contain 'This can not be found' in their string representation. In the above example, this is <someOtherElement>, <yetAnotherElement> and: <body> and <html>.
  2. [not(.//*[contains(., 'This can not be found')])]: This removes nodes with a child element that still contains the plain text 'This can not be found'. It removes the unwanted nodes <body> and <html> in the above example.

You can try these XPaths out here.

这篇关于XPath:通过纯文本查找HTML元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆