你如何解析一个网页并提取所有的href链接？ [英] How do you parse a web page and extract all the href links?

查看：142 发布时间：2018/5/30 9:22:00 html parsing groovy

本文介绍了你如何解析一个网页并提取所有的href链接？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果页面包含以下链接：

I want to parse a web page in Groovy and extract all of the href links and the associated text with it.

If the page contained these links:

<a href="http://www.google.com">Google</a><br />
<a href="http://www.apple.com">Apple</a>

输出结果为：

the output would be:

Google, http://www.google.com<br />
Apple, http://www.apple.com

我正在寻找Groovy的答案。 AKA。简单的方法！

I'm looking for a Groovy answer. AKA. The easy way!

推荐答案

假设格式良好的XHTML，啜饮xml，收集所有标签，找到'a'标签，并打印出href和文本。

Assuming well-formed XHTML, slurp the xml, collect up all the tags, find the 'a' tags, and print out the href and text.

input = """<html><body>
<a href = "http://www.hjsoft.com/">John</a>
<a href = "http://www.google.com/">Google</a>
<a href = "http://www.stackoverflow.com/">StackOverflow</a>
</body></html>"""

doc = new XmlSlurper().parseText(input)
doc.depthFirst().collect { it }.findAll { it.name() == "a" }.each {
    println "${it.text()}, ${it.@href.text()}"
}

这篇关于你如何解析一个网页并提取所有的href链接？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

你如何解析一个网页并提取所有的href链接？ [英] How do you parse a web page and extract all the href links?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

你如何解析一个网页并提取所有的href链接？ [英] How do you parse a web page and extract all the href links?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭