从BR​​标签中提取文本 [英] Extract Text from BR tags

查看:142
本文介绍了从BR​​标签中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

之前我已经能够使用Selenium提取文本了,但是我只是在<之间提取数字时遇到了麻烦。 BR>标签。以下是html代码示例。

I have been able to extract text using Selenium before, however I'am having trouble with just extracting the numbers between < BR > tags. Here is a sample of the html code.

<DIV class="pagebodydiv">
    <TABLE  CLASS="datadisplaytable" SUMMARY="This table will display needed information." WIDTH="100%">
<TR>
<TD CLASS="nttitle" scope="colgroup" >Working Title</A></TD>
</TR>
<TR>
<TD CLASS="ntdefault">
 Further information on subject
<BR>
    3.000
<BR>
    2.000  
<BR>
<BR>
<BR>
<BR>
<BR>
More information
<BR>
<BR>
</TABLE>

到目前为止,我尝试过使用:

So far I have tried using:

WebElement creditinfo = driver.findElement(By.xpath("//div[@class='pagebodydiv']/text()[preceding-sibling::br]

元素numInfo = doc .select(br);

但是,我一直遇到NoSuchElementException错误,InvalidSelectorException错误,或者它没有我有什么想法可以获得这些信息吗?

However, I keep running into a NoSuchElementException error, an InvalidSelectorException error, or it just doesn't return anything. Any ideas on how I can get the information?

推荐答案

你实际上可以选择< BR> 标签之间的文本节点。在HTML(不是XHTML)中,它们充当自动关闭标签(例如< br /> )。基于该行为,您可以选择在使用前后具有< BR> 标记的所有文本节点:

You actually can select the text nodes between <BR> tags. In HTML (not XHTML) they act as self-closing tags (like <br/>). Based on that behaviour, you could select all text nodes that have a <BR> tag before and after it using:

//TABLE[@CLASS='datadisplaytable']/TR/TD[@CLASS="ntdefault"]
/text()[preceding-sibling::node()[1][self::BR] 
        and following-sibling::node()[1][self::BR]]

那个还会选择空白行和不是数字的字符文本。

That would select also the blank lines and the character text which is not a number.

你可以删除添加的空空间节点[normalize -space(。)!=''] 到表达式的末尾(现在只返回三个节点)。您可以使用表达式末尾的位置谓词选择所需的节点( [1] 以选择第一个节点。

You can get rid of the empty space nodes adding a [normalize-space(.) != ''] to the end of the expression (which will now only return three nodes). And you can select which node you want using a positional predicate at the end of the expression ([1] to select the first node.

下面的表达式选择包含值 2.000 的文本节点:

The expression below selects the text node containing the value 2.000:

//TABLE[@CLASS='datadisplaytable']/TR/TD[@CLASS="ntdefault"]
/text()[preceding-sibling::node()[1][self::BR] 
        and following-sibling::node()[1][self::BR]][normalize-space(.) != ''][2]

注意:我假设你的源码实际上有大写的标签名称,因为在XPath中< TD> < td> 不一样。我不确定是多么宽容解析HTML时,Selenium就是这个。

Note: I'm assuming your source actually has tag names in uppercase, since in XPath <TD> is not the same as <td>. I'm not sure how tolerant Selenium is about this when parsing HTML.

这篇关于从BR​​标签中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆