从BR标签中提取文本 [英] Extract Text from BR tags
问题描述
之前我已经能够使用Selenium提取文本了,但是我只是在<之间提取数字时遇到了麻烦。 BR>标签。以下是html代码示例。
I have been able to extract text using Selenium before, however I'am having trouble with just extracting the numbers between < BR > tags. Here is a sample of the html code.
<DIV class="pagebodydiv">
<TABLE CLASS="datadisplaytable" SUMMARY="This table will display needed information." WIDTH="100%">
<TR>
<TD CLASS="nttitle" scope="colgroup" >Working Title</A></TD>
</TR>
<TR>
<TD CLASS="ntdefault">
Further information on subject
<BR>
3.000
<BR>
2.000
<BR>
<BR>
<BR>
<BR>
<BR>
More information
<BR>
<BR>
</TABLE>
到目前为止,我尝试过使用:
So far I have tried using:
WebElement creditinfo = driver.findElement(By.xpath("//div[@class='pagebodydiv']/text()[preceding-sibling::br]
和元素numInfo = doc .select(br);
但是,我一直遇到NoSuchElementException错误,InvalidSelectorException错误,或者它没有我有什么想法可以获得这些信息吗?
However, I keep running into a NoSuchElementException error, an InvalidSelectorException error, or it just doesn't return anything. Any ideas on how I can get the information?
推荐答案
你实际上可以选择< BR>
标签之间的文本节点。在HTML(不是XHTML)中,它们充当自动关闭标签(例如< br />
)。基于该行为,您可以选择在使用前后具有< BR>
标记的所有文本节点:
You actually can select the text nodes between <BR>
tags. In HTML (not XHTML) they act as self-closing tags (like <br/>
). Based on that behaviour, you could select all text nodes that have a <BR>
tag before and after it using:
//TABLE[@CLASS='datadisplaytable']/TR/TD[@CLASS="ntdefault"]
/text()[preceding-sibling::node()[1][self::BR]
and following-sibling::node()[1][self::BR]]
那个还会选择空白行和不是数字的字符文本。
That would select also the blank lines and the character text which is not a number.
你可以删除添加的空空间节点[normalize -space(。)!='']
到表达式的末尾(现在只返回三个节点)。您可以使用表达式末尾的位置谓词选择所需的节点( [1]
以选择第一个节点。
You can get rid of the empty space nodes adding a [normalize-space(.) != '']
to the end of the expression (which will now only return three nodes). And you can select which node you want using a positional predicate at the end of the expression ([1]
to select the first node.
下面的表达式选择包含值 2.000
的文本节点:
The expression below selects the text node containing the value 2.000
:
//TABLE[@CLASS='datadisplaytable']/TR/TD[@CLASS="ntdefault"]
/text()[preceding-sibling::node()[1][self::BR]
and following-sibling::node()[1][self::BR]][normalize-space(.) != ''][2]
注意:我假设你的源码实际上有大写的标签名称,因为在XPath中< TD>
与< td>
不一样。我不确定是多么宽容解析HTML时,Selenium就是这个。
Note: I'm assuming your source actually has tag names in uppercase, since in XPath <TD>
is not the same as <td>
. I'm not sure how tolerant Selenium is about this when parsing HTML.
这篇关于从BR标签中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!