如何使用 Java 在 HTML 中查找 URL [英] How to find URLs in HTML using Java

查看：61 发布时间：2021/7/17 19:49:40 java string search web-crawler

本文介绍了如何使用 Java 在 HTML 中查找 URL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下...我不会说问题，而是情况.

I have the following... I wouldn't say problem, but situation.

我有一些带有标签和所有内容的 HTML.我想为每个 URL 搜索 HTML.我现在正在通过检查它说h"然后t"然后t"然后p"的位置来做到这一点，但我认为这不是一个很好的解决方案

I have some HTML with tags and everything. I want to search the HTML for every URL. I'm doing it now by checking where it says 'h' then 't' then 't' then 'p', but I don't think is a great solution

有什么好主意吗?

补充:我正在寻找某种伪代码，但为了以防万一，我特别在这个项目中使用了 Java

Added: I'm looking for some kind of pseudocode but, just in case, I'm using Java for this project in particular

推荐答案

尝试使用 HTML 解析库，然后在 HTML 文档中搜索标签.

Try using a HTML parsing library then search for <a> tags in the HTML document.

Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Elements links = doc.select("a[href]"); // a with href

<块引用>

并非所有的 url 都在标签中，有些是文本有些在链接或其他标签中

您不应通过扫描 HTML 源代码来实现此目的.

您最终会得到不一定在页面文本"中的链接元素，例如，您可能会在页面中得到 JS 脚本的链接".

You shouldn't scan the HTML source to achieve this.

最好的方法仍然是使用专为工作而设计的工具.

您应该抓取 HTML 标签并覆盖其中最有可能包含链接"的标签(例如:<h1>、<p>、


Best way is still that you use a tool made for the job.

等).HTML 解析器提供类似正则表达式的功能来过滤标签的内容，类似于以 HTTP 开头"的逻辑.

You should grab HTML tags and cover the most likely ones to have 'links' inside them (say: <h1>, <p>, <div> etc) . HTML parsers provide regex-like functionalities to filter through the content of the tags, something similar to your logic of "starts with HTTP".

[attr^=value], [attr$=value],[attr*=value]: 元素以开头，结尾的属性，或包含值，例如select("[href*=/path/]")

[attr^=value], [attr$=value], [attr*=value]: elements with attributes that start with, end with, or contain the value, e.g. select("[href*=/path/]")

参见:jSoup.

这篇关于如何使用 Java 在 HTML 中查找 URL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用 Java 在 HTML 中查找 URL [英] How to find URLs in HTML using Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何使用 Java 在 HTML 中查找 URL [英] How to find URLs in HTML using Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭