Groovy XMLSlurper问题 [英] Groovy XMLSlurper issue
问题描述
我想用XmlSlurper解析一个我使用HTTPBuilder读取的HTML文档。最初我试图这样做:
def response = http.get(path:index.php,contentType: TEXT)
def slurper = new XmlSlurper()
def xml = slurper.parse(响应)
但是它会产生一个异常:
java.io.IOException:服务器返回的HTTP响应代码:503 :http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
我找到了解决方法来提供缓存的DTD文件。我发现了一个类的简单实现,它可以帮助这里: p>
class CachedDTD {
/ **
*将DTD'systemId'作为InputSource返回。
* @param publicId
* @param systemId
* @return InputSource用于本地缓存的DTD。
* /
def static entityResolver = [
resolveEntity:{publicId,systemId - >
try {
String dtd =dtd /+ systemId.split(/)。last()
Logger.getRootLogger()。debugDTD path:$ {dtd}
new org.xml.sax.InputSource(CachedDTD.class.getResourceAsStream(dtd))
} catch(e){
//e.printStackTrace()
Logger.getRootLogger ().fatal致命错误,e
null
}
}
]作为org.xml.sax.EntityResolver
}
我的包树看起来如下所示:
< img src =https://i.stack.imgur.com/1gqF9.jpgalt =alt text>
我修改了一些解析代码响应,所以它看起来像这样:
def response = http.get(path:index.php,contentType:TEXT )
def slurper = new XmlSlurper()
slurper.setEntityResolver(org.yuri.CachedDTD.entityResolver)
def xml = slurper.parse(响应)
但现在我得到 java.net.MalformedURLException
。从CachedDTD entityResolver记录的DTD路径是 org / yuri / dtd / xhtml1-transitional.dtd
,我无法正常工作......
您可以使用HTML解析与XmlSlurper一起解决这些问题。
http://sourceforge.net/projects/nekohtml/
这里的示例用法
http://groovy.codehaus.org/Testing+Web+Applications
I want to parse with XmlSlurper a HTML document which I read using HTTPBuilder. Initialy I tried to do it this way:
def response = http.get(path: "index.php", contentType: TEXT)
def slurper = new XmlSlurper()
def xml = slurper.parse(response)
But it produces an exception:
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
I found a workaround to provide cached DTD files. I found a simple implementation of class which should help here:
class CachedDTD {
/**
* Return DTD 'systemId' as InputSource.
* @param publicId
* @param systemId
* @return InputSource for locally cached DTD.
*/
def static entityResolver = [
resolveEntity: { publicId, systemId ->
try {
String dtd = "dtd/" + systemId.split("/").last()
Logger.getRootLogger().debug "DTD path: ${dtd}"
new org.xml.sax.InputSource(CachedDTD.class.getResourceAsStream(dtd))
} catch (e) {
//e.printStackTrace()
Logger.getRootLogger().fatal "Fatal error", e
null
}
}
] as org.xml.sax.EntityResolver
}
My package tree looks as shown below:
I modified also a little code for parsing response, so it looks like this:
def response = http.get(path: "index.php", contentType: TEXT)
def slurper = new XmlSlurper()
slurper.setEntityResolver(org.yuri.CachedDTD.entityResolver)
def xml = slurper.parse(response)
But now I'm getting java.net.MalformedURLException
. Logged DTD path from CachedDTD entityResolver is org/yuri/dtd/xhtml1-transitional.dtd
and I can't get it working...
there is a HTML parse that you could use, in conjunction with XmlSlurper to address these problems
http://sourceforge.net/projects/nekohtml/
Sample useage here
http://groovy.codehaus.org/Testing+Web+Applications
这篇关于Groovy XMLSlurper问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!