使用XmlSlurper：如何在遍历GPathResult时选择子元素 [英] Using XmlSlurper: How to select sub-elements while iterating over a GPathResult

查看：390 发布时间：2018/5/30 9:47:36 html parsing groovy xmlslurper

本文介绍了使用XmlSlurper：如何在遍历GPathResult时选择子元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我写了一个HTML解析器，它使用TagSoup将一个格式良好的结构传递给XMLSlurper。

以下是一般化的代码：

  def htmlText =
< html> 
< body> 
< div id =divId class =divclass> 
< h2>标题2< / h2> 
< ol> 
< li>< h3>< a class =这里是地址< span>电话号码：< strong>电话号码< / span>< / h3>< / span>< / strong>< / span>< / address>< / li> 
< li>< h3>< a class =boxhref =＃href2> href2 link text< / strong>< / span>< span> extra stuff< / span>< / h3>< address>以下是另一个地址< span>另一电话：< strong> 0845 1111111< / strong>< / span><地址>< / li> 
< / ol> 
< / div> 
< / body> 
< / html> 
 
 
 def html = new X mlSlurper（new org.ccil.cowan.tagsoup.Parser（））。parseText（htmlText）; 
 
 html。'**'。grep {it。@ class =='divclass'} .ol.li.each {linkItem  - > 
 def link = linkItem.h3.a。@ href 
 def address = linkItem.address.text（）
 println$ link：$ address\\\

}

我希望每个人都让我依次选择每个'li'，这样我就可以检索相应的href和地址细节。相反，我得到这个输出：

 ＃href1＃href2：这是地址电话号码：telephoneHere是另一个地址另一个电话：0845 1111111

我已经在Web上检查过各种示例，这些示例或者处理XML，例如从此文件中检索所有链接。似乎it.h3.a. @ href表达式正在收集文本中的所有hrefs，即使我将它传递给父'li'节点的引用。

你能告诉我：

为什么我要输出显示

我如何检索每个'li'项目的href /地址对

解决方案

用find找到grep：

  html。'**'。find {it。@ class =='divclass'} .ol.li.each {linkItem  - > 
 def link = linkItem.h3.a。@ href 
 def address = linkItem.address.text（）
 println$ link：$ address\\\

}

然后您就会得到

 ＃href1：这是地址电话号码：telephone 
 
＃href2：这是另一个地址另一个电话：0845 1111111

grep返回一个ArrayList，但是查找返回一个NodeChild类：

  println html。'**'。grep {it。@ class =='divclass'} .getClass（）
 println html。'**'。find {it。@ class =='divclass'} .getClass （）

结果为：

 类java.util.ArrayList 
类groovy.util.slurpersupport.NodeChild

因此，如果你想使用grep，那么你可以嵌套另一个这样的工作，以便它能够工作

  html。 '**'。grep {it。@ class =='divclass'} .ol.li.each {
 it.each {linkItem - > 
 def link = linkItem.h3.a。@ href 
 def address = linkItem.address.text（）
 println$ link：$ address\\\

} 
}

长话短说，在你的情况下，使用find而不是grep。

I am writing an HTML parser, which uses TagSoup to pass a well-formed structure to XMLSlurper.

Here's the generalised code:
def htmlText = """ <html> <body> <div id="divId" class="divclass"> <h2>Heading 2</h2> <ol> <li><h3><a class="box" href="#href1">href1 link text</a> <span>extra stuff</span></h3><address>Here is the address<span>Telephone number: <strong>telephone</strong></span></address></li> <li><h3><a class="box" href="#href2">href2 link text</a> <span>extra stuff</span></h3><address>Here is another address<span>Another telephone: <strong>0845 1111111</strong></span></address></li> </ol> </div> </body> </html> """ def html = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText( htmlText ); html.'**'.grep { it.@class == 'divclass' }.ol.li.each { linkItem -> def link = linkItem.h3.a.@href def address = linkItem.address.text() println "$link: $address\n" }
I would expect the each to let me select each 'li' in turn so I can retrieve the corresponding href and address details. Instead, I am getting this output:
#href1#href2: Here is the addressTelephone number: telephoneHere is another addressAnother telephone: 0845 1111111
I've checked various example on the web and these either deal with XML, or are one-liner examples like "retrieve all links from this file". It's seems that the it.h3.a.@href expression is collecting all hrefs in the text, even though I'm passing it a reference to the parent 'li' node.

Can you let me know:

Why I'm getting the output shown

How I can retrieve the href/address pairs for each 'li' item

Thanks.
解决方案
Replace grep with find:
html.'**'.find { it.@class == 'divclass' }.ol.li.each { linkItem -> def link = linkItem.h3.a.@href def address = linkItem.address.text() println "$link: $address\n" }
then you'll get
#href1: Here is the addressTelephone number: telephone #href2: Here is another addressAnother telephone: 0845 1111111
grep returns an ArrayList but find returns a NodeChild class:
println html.'**'.grep { it.@class == 'divclass' }.getClass() println html.'**'.find { it.@class == 'divclass' }.getClass()
results in:
class java.util.ArrayList class groovy.util.slurpersupport.NodeChild
thus if you wanted to use grep you could then nest another each like this for it to work
html.'**'.grep { it.@class == 'divclass' }.ol.li.each { it.each { linkItem -> def link = linkItem.h3.a.@href def address = linkItem.address.text() println "$link: $address\n" } }
Long story short, in your case, use find rather than grep.

这篇关于使用XmlSlurper：如何在遍历GPathResult时选择子元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用XmlSlurper：如何在遍历GPathResult时选择子元素 [英] Using XmlSlurper: How to select sub-elements while iterating over a GPathResult

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用XmlSlurper：如何在遍历GPathResult时选择子元素 [英] Using XmlSlurper: How to select sub-elements while iterating over a GPathResult

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭