XPath:通过纯文本查找HTML元素 [英] XPath: Find HTML element by plain text
问题描述
请注意:这个问题是。
我正在寻找一种XPath,它可以让我在HTML文档中查找具有给定纯文本的元素。例如,假设我有以下HTML:
< html>
< head> ...< / head>
< body>
< someElement>可以找到< / someElement>
<嵌套>
< someOtherElement>这可以< em>不是< / em>被发现最嵌套< / someOtherElement>
< / nested>
< yetAnotherElement>这可以< em>不是< / em>被发现< / yetAnotherElement>
< / body>
< / html>
我需要通过文本进行搜索,并且能够找到< someElement>
使用以下XPath:
// * [contains(text(),'This can be找到')]
我正在寻找类似的XPath,可以让我找到 < someOtherElement>
和< yetAnotherElement>
使用 plain 文本This can not找到
。以下不起作用:
// * [contains(text(),'This can not be found')]
据我所知,这是因为嵌套 em
破坏This can not be found的文本流的元素。在某种程度上,是否可以通过XPath来忽略如上所述的类似嵌套?
您可以使用
// * [contains(。,'This can not be found')]
[not(.// * [contains(。,'This can not be found')])]
这个XPath由两部分:
-
// * [contains(。,'This can not be found')] code>:运算符
。
将上下文节点转换为其字符串表示形式。因此,该部分选择包含This can not be found的所有节点的字符串表示形式。在上例中,这是< someOtherElement>
,< yetAnotherElement>
和< body>
和< html>
。 -
[not(.//* [contains(。,'This can not be found')])]
:这将删除包含仍包含纯文本的子元素的节点。无法找到'。它在上面的例子中删除了不需要的节点< body>
和< html>
。 b $ b
您可以尝试使用这些XPath 这里。
Please note: This question is a more refined version of a previous question.
I am looking for an XPath that lets me find elements with a given plain text in an HTML document. For example, suppose I have the following HTML:
<html>
<head>...</head>
<body>
<someElement>This can be found</someElement>
<nested>
<someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
</nested>
<yetAnotherElement>This can <em>not</em> be found</yetAnotherElement>
</body>
</html>
I need to search by text and am able to find <someElement>
using the following XPath:
//*[contains(text(), 'This can be found')]
I am looking for a similar XPath that lets me find <someOtherElement>
and <yetAnotherElement>
using the plain text "This can not be found"
. The following does not work:
//*[contains(text(), 'This can not be found')]
I understand that this is because of the nested em
element that "disrupts" the text flow of "This can not be found". Is it possible via XPaths to, in a way, ignore such or similar nestings as the one above?
You can use
//*[contains(., 'This can not be found')]
[not(.//*[contains(., 'This can not be found')])]
This XPath consists of two parts:
//*[contains(., 'This can not be found')]
: The operator.
converts the context node to its string representation. This part therefore selects all nodes that contain 'This can not be found' in their string representation. In the above example, this is<someOtherElement>
,<yetAnotherElement>
and:<body>
and<html>
.[not(.//*[contains(., 'This can not be found')])]
: This removes nodes with a child element that still contains the plain text 'This can not be found'. It removes the unwanted nodes<body>
and<html>
in the above example.
You can try these XPaths out here.
这篇关于XPath:通过纯文本查找HTML元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!