使用 Ant 脚本解析 HTML [英] Parse HTML using with an Ant Script
问题描述
我需要从 HTML 文件中检索一些值.我需要使用 Ant 以便我可以在脚本的其他部分使用这些值.
I need to retrieve some values from an HTML file. I need to use Ant so I can use these values in other parts of my script.
这甚至可以在 Ant 中实现吗?
Can this even be achieved in Ant?
推荐答案
如其他答案所述,您不能在纯"XML 中执行此操作.您需要嵌入一种编程语言.我个人最喜欢的是 Groovy,它与 ANT 的集成非常好.
As stated in the other answers you can't do this in "pure" XML. You need to embed a programming language. My personal favourite is Groovy, it's integration with ANT is excellent.
以下是从 groovy 主页检索徽标 URL 的示例:
Here's a sample which retrieves the logo URL, from the groovy homepage:
parse:
print:
[echo]
[echo] Logo URL: http://groovy.codehaus.org/images/groovy-logo-medium.png
[echo]
build.xml
Build 使用 ivy 插件 来检索所有 3rd 方依赖项.
build.xml
Build uses the ivy plug-in to retrieve all 3rd party dependencies.
<project name="demo" default="print" xmlns:ivy="antlib:org.apache.ivy.ant">
<target name="resolve">
<ivy:resolve/>
<ivy:cachepath pathid="build.path" conf="build"/>
</target>
<target name="parse" depends="resolve">
<taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy" classpathref="build.path"/>
<groovy>
import org.htmlcleaner.*
def address = 'http://groovy.codehaus.org/'
// Clean any messy HTML
def cleaner = new HtmlCleaner()
def node = cleaner.clean(address.toURL())
// Convert from HTML to XML
def props = cleaner.getProperties()
def serializer = new SimpleXmlSerializer(props)
def xml = serializer.getXmlAsString(node)
// Parse the XML into a document we can work with
def page = new XmlSlurper(false,false).parseText(xml)
// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
</groovy>
</target>
<target name="print" depends="parse">
<echo>
Logo URL: ${logo}
</echo>
</target>
</project>
解析逻辑是纯groovy编程.我喜欢您可以轻松浏览页面 DOM 树的方式:
The parsing logic is pure groovy programming. I love the way you can easily walk the page's DOM tree:
// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
ivy.xml
Ivy 类似于 Maven.它管理您对 3rd 方软件的依赖.这里它被用来拉下 groovy 和 groovy 逻辑正在使用的 HTMLCleaner 库:
ivy.xml
Ivy is similar to Maven. It manages your dependencies on 3rd party software. Here it's being used to pull down groovy and the HTMLCleaner library the groovy logic is using:
<ivy-module version="2.0">
<info organisation="org.myspotontheweb" module="demo"/>
<configurations defaultconfmapping="build->default">
<conf name="build" description="ANT tasks"/>
</configurations>
<dependencies>
<dependency org="org.codehaus.groovy" name="groovy-all" rev="1.8.2"/>
<dependency org="net.sourceforge.htmlcleaner" name="htmlcleaner" rev="2.2"/>
</dependencies>
</ivy-module>
如何安装常春藤
Ivy 是一个标准的 ANT 插件.下载它的 jar 并将其放在以下目录之一中:
How to install ivy
Ivy is a standard ANT plugin. Download it's jar and place it in one of the following directories:
$HOME/.ant/lib
$ANT_HOME/lib
我不知道为什么 ANT 项目不附带 ivy.
I don't know why the ANT project doesn't ship with ivy.
这篇关于使用 Ant 脚本解析 HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!