使用 Ant 脚本解析 HTML [英] Parse HTML using with an Ant Script

查看:26
本文介绍了使用 Ant 脚本解析 HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从 HTML 文件中检索一些值.我需要使用 Ant 以便我可以在脚本的其他部分使用这些值.

I need to retrieve some values from an HTML file. I need to use Ant so I can use these values in other parts of my script.

这甚至可以在 Ant 中实现吗?

Can this even be achieved in Ant?

推荐答案

如其他答案所述,您不能在纯"XML 中执行此操作.您需要嵌入一种编程语言.我个人最喜欢的是 Groovy,它与 ANT 的集成非常好.

As stated in the other answers you can't do this in "pure" XML. You need to embed a programming language. My personal favourite is Groovy, it's integration with ANT is excellent.

以下是从 groovy 主页检索徽标 URL 的示例:

Here's a sample which retrieves the logo URL, from the groovy homepage:

parse:

print:
     [echo] 
     [echo]         Logo URL: http://groovy.codehaus.org/images/groovy-logo-medium.png
     [echo]     

build.xml

Build 使用 ivy 插件 来检索所有 3rd 方依赖项.

build.xml

Build uses the ivy plug-in to retrieve all 3rd party dependencies.

<project name="demo" default="print" xmlns:ivy="antlib:org.apache.ivy.ant">

    <target name="resolve">
        <ivy:resolve/>
        <ivy:cachepath pathid="build.path" conf="build"/>
    </target>

    <target name="parse" depends="resolve">
        <taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy" classpathref="build.path"/>

        <groovy>
        import org.htmlcleaner.*

        def address = 'http://groovy.codehaus.org/'

        // Clean any messy HTML
        def cleaner = new HtmlCleaner()
        def node = cleaner.clean(address.toURL())

        // Convert from HTML to XML
        def props = cleaner.getProperties()
        def serializer = new SimpleXmlSerializer(props)
        def xml = serializer.getXmlAsString(node)

        // Parse the XML into a document we can work with
        def page = new XmlSlurper(false,false).parseText(xml)

        // Retrieve the logo URL
        properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
        </groovy>
    </target>

    <target name="print" depends="parse">
        <echo>
        Logo URL: ${logo}
        </echo>
    </target>

</project>

解析逻辑是纯groovy编程.我喜欢您可以轻松浏览页面 DOM 树的方式:

The parsing logic is pure groovy programming. I love the way you can easily walk the page's DOM tree:

// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src

ivy.xml

Ivy 类似于 Maven.它管理您对 3rd 方软件的依赖.这里它被用来拉下 groovy 和 groovy 逻辑正在使用的 HTMLCleaner 库:

ivy.xml

Ivy is similar to Maven. It manages your dependencies on 3rd party software. Here it's being used to pull down groovy and the HTMLCleaner library the groovy logic is using:

<ivy-module version="2.0">
    <info organisation="org.myspotontheweb" module="demo"/>
    <configurations defaultconfmapping="build->default">
        <conf name="build" description="ANT tasks"/>
    </configurations>
    <dependencies>
        <dependency org="org.codehaus.groovy" name="groovy-all" rev="1.8.2"/>
        <dependency org="net.sourceforge.htmlcleaner" name="htmlcleaner" rev="2.2"/>
    </dependencies>
</ivy-module>

如何安装常春藤

Ivy 是一个标准的 ANT 插件.下载它的 jar 并将其放在以下目录之一中:

How to install ivy

Ivy is a standard ANT plugin. Download it's jar and place it in one of the following directories:

$HOME/.ant/lib
$ANT_HOME/lib

我不知道为什么 ANT 项目不附带 ivy.

I don't know why the ANT project doesn't ship with ivy.

这篇关于使用 Ant 脚本解析 HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆