解析HTML与Ant脚本使用 [英] Parse HTML using with an Ant Script

查看:109
本文介绍了解析HTML与Ant脚本使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从HTML文件中检索某些值。我需要使用Ant这样我就可以在我的脚本的其他地方使用这些值。

I need to retrieve some values from an HTML file. I need to use Ant so I can use these values in other parts of my script.

可这甚至可以在Ant中实现吗?

Can this even be achieved in Ant?

推荐答案

正如在其他的答案,你不能在纯XML做到这一点说明。您需要嵌入一个编程语言。我个人最喜欢的是Groovy中,它与ANT 整合是优秀的。

As stated in the other answers you can't do this in "pure" XML. You need to embed a programming language. My personal favourite is Groovy, it's integration with ANT is excellent.

下面是它检索标识URL样本,从常规的主页:

Here's a sample which retrieves the logo URL, from the groovy homepage:

parse:

print:
     [echo] 
     [echo]         Logo URL: http://groovy.codehaus.org/images/groovy-logo-medium.png
     [echo]     

的build.xml

构建使用常春藤插件检索所有的第三方的依赖。

build.xml

Build uses the ivy plug-in to retrieve all 3rd party dependencies.

<project name="demo" default="print" xmlns:ivy="antlib:org.apache.ivy.ant">

    <target name="resolve">
        <ivy:resolve/>
        <ivy:cachepath pathid="build.path" conf="build"/>
    </target>

    <target name="parse" depends="resolve">
        <taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy" classpathref="build.path"/>

        <groovy>
        import org.htmlcleaner.*

        def address = 'http://groovy.codehaus.org/'

        // Clean any messy HTML
        def cleaner = new HtmlCleaner()
        def node = cleaner.clean(address.toURL())

        // Convert from HTML to XML
        def props = cleaner.getProperties()
        def serializer = new SimpleXmlSerializer(props)
        def xml = serializer.getXmlAsString(node)

        // Parse the XML into a document we can work with
        def page = new XmlSlurper(false,false).parseText(xml)

        // Retrieve the logo URL
        properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
        </groovy>
    </target>

    <target name="print" depends="parse">
        <echo>
        Logo URL: ${logo}
        </echo>
    </target>

</project>

解析逻辑是纯Groovy编程。我爱你,可以轻松地步行页面的DOM树的方式:

The parsing logic is pure groovy programming. I love the way you can easily walk the page's DOM tree:

// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src

的ivy.xml

常春藤是类似于Maven的。它可以管理您的第三方软件的依赖。在这里,它被用来拉下来Groovy和Groovy的逻辑是使用HTMLCleaner库:

ivy.xml

Ivy is similar to Maven. It manages your dependencies on 3rd party software. Here it's being used to pull down groovy and the HTMLCleaner library the groovy logic is using:

<ivy-module version="2.0">
    <info organisation="org.myspotontheweb" module="demo"/>
    <configurations defaultconfmapping="build->default">
        <conf name="build" description="ANT tasks"/>
    </configurations>
    <dependencies>
        <dependency org="org.codehaus.groovy" name="groovy-all" rev="1.8.2"/>
        <dependency org="net.sourceforge.htmlcleaner" name="htmlcleaner" rev="2.2"/>
    </dependencies>
</ivy-module>

如何安装常春藤

常春藤是一个标准的ANT插件。下载的JAR,并将其放置在以下目录之一:

How to install ivy

Ivy is a standard ANT plugin. Download it's jar and place it in one of the following directories:

$HOME/.ant/lib
$ANT_HOME/lib

我不知道为什么ANT项目不常春藤发货。

I don't know why the ANT project doesn't ship with ivy.

这篇关于解析HTML与Ant脚本使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆