使用 Ant 脚本解析 HTML [英] Parse HTML using with an Ant Script

查看：26 发布时间：2021/11/11 2:01:02 ant

本文介绍了使用 Ant 脚本解析 HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要从 HTML 文件中检索一些值.我需要使用 Ant 以便我可以在脚本的其他部分使用这些值.

I need to retrieve some values from an HTML file. I need to use Ant so I can use these values in other parts of my script.

这甚至可以在 Ant 中实现吗?

Can this even be achieved in Ant?

build.xml

Build 使用 ivy 插件来检索所有 3rd 方依赖项.

build.xml

Build uses the ivy plug-in to retrieve all 3rd party dependencies.

<project name="demo" default="print" xmlns:ivy="antlib:org.apache.ivy.ant">

    <target name="resolve">
        <ivy:resolve/>
        <ivy:cachepath pathid="build.path" conf="build"/>
    </target>

    <target name="parse" depends="resolve">
        <taskdef name="groovy" classname="org.codehaus.groovy.ant.Groovy" classpathref="build.path"/>

        <groovy>
        import org.htmlcleaner.*

        def address = 'http://groovy.codehaus.org/'

        // Clean any messy HTML
        def cleaner = new HtmlCleaner()
        def node = cleaner.clean(address.toURL())

        // Convert from HTML to XML
        def props = cleaner.getProperties()
        def serializer = new SimpleXmlSerializer(props)
        def xml = serializer.getXmlAsString(node)

        // Parse the XML into a document we can work with
        def page = new XmlSlurper(false,false).parseText(xml)

        // Retrieve the logo URL
        properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src
        </groovy>
    </target>

    <target name="print" depends="parse">
        <echo>
        Logo URL: ${logo}
        </echo>
    </target>

</project>

解析逻辑是纯groovy编程.我喜欢您可以轻松浏览页面 DOM 树的方式:

The parsing logic is pure groovy programming. I love the way you can easily walk the page's DOM tree:

// Retrieve the logo URL
properties["logo"] = page.body.div[0].div[1].div[0].div[0].div[0].img.@src

ivy.xml

Ivy 类似于 Maven.它管理您对 3rd 方软件的依赖.这里它被用来拉下 groovy 和 groovy 逻辑正在使用的 HTMLCleaner 库:

ivy.xml

Ivy is similar to Maven. It manages your dependencies on 3rd party software. Here it's being used to pull down groovy and the HTMLCleaner library the groovy logic is using:

<ivy-module version="2.0">
    <info organisation="org.myspotontheweb" module="demo"/>
    <configurations defaultconfmapping="build->default">
        <conf name="build" description="ANT tasks"/>
    </configurations>
    <dependencies>
        <dependency org="org.codehaus.groovy" name="groovy-all" rev="1.8.2"/>
        <dependency org="net.sourceforge.htmlcleaner" name="htmlcleaner" rev="2.2"/>
    </dependencies>
</ivy-module>

如何安装常春藤

Ivy 是一个标准的 ANT 插件.下载它的 jar 并将其放在以下目录之一中:

How to install ivy

Ivy is a standard ANT plugin. Download it's jar and place it in one of the following directories:

$HOME/.ant/lib
$ANT_HOME/lib

我不知道为什么 ANT 项目不附带 ivy.

I don't know why the ANT project doesn't ship with ivy.

这篇关于使用 Ant 脚本解析 HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Ant 脚本解析 HTML [英] Parse HTML using with an Ant Script

问题描述

推荐答案

build.xml

build.xml

ivy.xml

ivy.xml

如何安装常春藤

How to install ivy

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Ant 脚本解析 HTML [英] Parse HTML using with an Ant Script

问题描述

推荐答案

build.xml

build.xml

ivy.xml

ivy.xml

如何安装常春藤

How to install ivy

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭