Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？ [英] Jsoup.parse() vs. Jsoup.parse() - or How does URL detection work in Jsoup?

查看：518 发布时间：2018/12/20 23:17:09 java html-parsing jsoup

本文介绍了Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Jsoup有2个 html parse（）方法：

Jsoup has 2 html parse() methods:

parse（String html） - 由于没有指定基URI，绝对URL
检测依赖于包含标记的HTML。

parse（String html，String baseUri） - 检索HTML
的URL。用于将相对URL解析为绝对URL，
在HTML声明标记之前发生。

parse(String html) - "As no base URI is specified, absolute URL detection relies on the HTML including a tag."
parse(String html, String baseUri) - "The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag."

我很难理解意思两者之间的区别：

I am having a difficulty understanding the meaning of the difference between the two:

在第二个解析（）中版本，解析绝对网址的相对网址，发生在
之前 HTML声明< base href> 标签是什么意思？如果页面中出现
< base href> 标记从不会怎样？

绝对URL检测的目的是什么？为什么Jsoup需要
来查找绝对URL？

最后，但最重要的是： baseUri 完整的URL HTML页面
（如原始文档中所述）或HTML页面的
基础网址？

In the 2nd parse() version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag" mean? What if a <base href> tag never occurs in the page?
What is the purpose of absolute URL detection? Why does Jsoup need to find the absolute URL?
Lastly, but most importantly: Is baseUri the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page?

推荐答案

它用于其他 元素#absUrl（） ，以便您可以检索<的预期绝对URL code>< a href> ，< img src> ，< link href> ，< script src> 等。例如

It's used for among others Element#absUrl() so that you can retrieve the (intended) absolute URL of an <a href>, <img src>, <link href>, <script src>, etc. E.g.

for (Element link : document.select("a")) {
    System.out.println(link.absUrl("href"));
}

如果要下载和/或解析链接的资源，这非常有用还有。

This is very useful if you want to download and/or parse the linked resources as well.

在第二个parse（）版本中，是什么将相对URL解析为绝对URL，这是在HTML声明< base href> 标记之前发生的？如果页面中永远不会出现< base href> 标记怎么办？

In the 2nd parse() version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag" mean? What if a <base href> tag never occurs in the page?

某些（差）网站可能已宣布< link> 或< script> 在 < base> 标记之前的相对网址。或者，如果没有< base> 标记的方法，那么只有给定的 baseUri 将用于解析整个文档的相对URL。

Some (poor) websites may have declared a <link> or <script> with a relative URL before the <base> tag. Or if there is no means of a <base> tag, then just the given baseUri will be used for resolving relative URLs of the entire document.

绝对URL的目的是什么检测？为什么Jsoup需要找到绝对URL？

为了在元素＃absUrl（）。这纯粹是为了最终用户的便利。 Jsoup不需要它来成功解析HTML。

In order to return the right URL on Element#absUrl(). This is purely for enduser's convenience. Jsoup doesn't need it in order to successfully parse the HTML at its own.

最后，但最重要的是：baseUri是HTML网页的完整网址（如原始文档中所述）还是HTML网页的基本网址？

前者。如果是后者，那么文档就会撒谎。 baseUri 不得与< base href> 混淆。

The former. If the latter, then documentation would be lying. The baseUri must not to be confused with <base href>.

这篇关于Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？ [英] Jsoup.parse() vs. Jsoup.parse() - or How does URL detection work in Jsoup?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Jsoup.parse（）与Jsoup.parse（） - 或者如何在Jsoup中使用URL检测？ [英] Jsoup.parse() vs. Jsoup.parse() - or How does URL detection work in Jsoup?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭