GPath来查找表头是否包含匹配的字符串 [英] GPath to find if a table header contains a matching string
问题描述
我使用NekoHTML解析器将HTML文件解析为格式良好的XML文档。但是我无法弄清楚GPath,因此我可以识别出具有Settings字符串的表。
I'm parsing an HTML file into a well-formed XML document using NekoHTML parser. However I can't quite figure out the GPath so that I can identify the table that has the "Settings" string.
def parser = new org.cyberneko.html.parsers.SAXParser()
parser.setFeature('http://xml.org/sax/features/namespaces', false)
def html =
'''
<html>
<title>Hiya!</title>
</html>
<body>
<table>
<tr>
<th colspan='3'>Settings</th>
<td>First cell r1</td>
<td>Second cell r1</td>
</tr>
</table>
<table>
<tr>
<th colspan='3'>Other Settings</th>
<td>First cell r2</td>
<td>Second cell r2</td>
</tr>
</table>
'''
def slurper = new XmlSlurper(parser)
def page = slurper.parseText(html)
在此示例中,应该选择第一个表,以便我可以遍历其中的其他行值。有人可以帮我用这个GPATH吗?
In this sample, the first table should be selected so that I can iterate over other row values in it. Can someone help me with this GPath please?
编辑:侧面问题 - 为什么
Side question - why does
println page.HTML.HEAD.TITLE
打印一个空字符串,不应该它返回标题?
print an empty string, shouldn't it return the title?
推荐答案
-
header,你应该可以这样做:
To get the table with 'Settings' in the header, you should be able to do:
def settingsTableNode = page.BODY.TABLE.find { table ->
table.TBODY.TR.TH.text() == 'Settings'
}
page
指向文档的根目录,因此您不需要 HTML
。所有您需要做的是:
page
points to the root of the document, so you don't need the HTML
. All you should need to do is:
println page.HEAD.TITLE
这篇关于GPath来查找表头是否包含匹配的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!