XPath:通过*纯*文本查找HTML元素 [英] XPath: Find HTML element by *plain* text

查看:32
本文介绍了XPath:通过*纯*文本查找HTML元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请注意:可以找到此问题的更精致版本,并提供适当的答案此处.

我想使用 Selenium Python 绑定在网页上查找具有给定文本的元素.例如,假设我有以下 HTML:

<头>...</头><身体><someElement>这可以找到</someElement><someOtherElement>这可以<em>not</em>找到</someOtherElement></html>

我需要按文本进行搜索,并且能够使用以下 XPath 找到 :

//*[contains(text(), '这个可以找到')]

我正在寻找一个类似的 XPath,它可以让我使用 plain 文本 "This can not be found" 找到 .以下方法不起作用:

//*[contains(text(), '这个无法找到')]

我理解这是因为嵌套的 em 元素中断"了This can not be found"的文本流.是否有可能通过 XPaths 以某种方式忽略与上面类似的嵌套?

解决方案

您可以使用 //*[contains(., 'This can not be found')].

上下文节点.将在与'This can not be found'进行比较之前转换为其字符串表示.

小心,因为您使用的是 //*,因此它将匹配包含此字符串的 ALL 全局元素.

在您的示例中,它将匹配:

  • !

您可以通过定位文档中的特定元素标签或特定部分(具有已知 ID 或类的

)来限制这一点

<小时>

在关于如何找到与文本条件匹配的最多嵌套元素的评论中编辑 OP 的问题:

接受的答案这里建议 //*[count(ancestor::*) = max(///*/count(ancestor::*))] 选择嵌套最多的元素.我认为它只是 XPath 2.0.

结合您的子字符串条件后,我能够在此处进行测试本文件

<头>...</头><身体><someElement>这可以找到</someElement><嵌套><someOtherElement>这可以<em>not</em>被发现最嵌套</someOtherElement></嵌套><someOtherElement>这可以<em>not</em>找到</someOtherElement></html>

使用这个 XPath 2.0 表达式

//*[contains(., '无法找到')][count(ancestor::*) = max(///*/count(./*[contains(., '无法找到')]/ancestor::*))]

并且它匹配包含This can not be found most nested"的元素.

可能有一种更优雅的方式来做到这一点.

Please note: A more refined version of this question, with an appropriate answer can be found here.

I would like to use the Selenium Python bindings to find elements with a given text on a web page. For example, suppose I have the following HTML:

<html>
    <head>...</head>
    <body>
        <someElement>This can be found</someElement>
        <someOtherElement>This can <em>not</em> be found</someOtherElement>
    </body>
</html>

I need to search by text and am able to find <someElement> using the following XPath:

//*[contains(text(), 'This can be found')]

I am looking for a similar XPath that lets me find <someOtherElement> using the plain text "This can not be found". The following does not work:

//*[contains(text(), 'This can not be found')]

I understand that this is because of the nested em element that "disrupts" the text flow of "This can not be found". Is it possible via XPaths to, in a way, ignore such or similar nestings as the one above?

解决方案

You can use //*[contains(., 'This can not be found')].

The context node . will be converted to its string representation before comparison to 'This can not be found'.

Be careful though since you are using //*, so it will match ALL englobing elements that contain this string.

In your example case, it will match:

  • <someOtherElement>
  • and <body>
  • and <html>!

You could restrict this by targeting specific element tags or specific section in your document (a <table> or <div> with a known id or class)


Edit for the OP's question in comment on how to find the most nested elements matching the text condition:

The accepted answer here suggests //*[count(ancestor::*) = max(//*/count(ancestor::*))] to select the most nested element. I think it's only XPath 2.0.

When combined with your substring condition, I was able to test it here with this document

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <someOtherElement>This can <em>not</em> be found</someOtherElement>
</body>
</html>

and with this XPath 2.0 expression

//*[contains(., 'This can not be found')]
   [count(ancestor::*) = max(//*/count(./*[contains(., 'This can not be found')]/ancestor::*))]

And it matches the element containing "This can not be found most nested".

There probably is a more elegant way to do that.

这篇关于XPath:通过*纯*文本查找HTML元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆