Jsoup.parse()与Jsoup.parse() - 或者如何在Jsoup中使用URL检测? [英] Jsoup.parse() vs. Jsoup.parse() - or How does URL detection work in Jsoup?
问题描述
Jsoup有2个 html parse()方法:
Jsoup has 2 html parse() methods:
- parse(String html) - 由于没有指定基URI,绝对URL
检测依赖于包含标记的HTML。 - parse(String html,String baseUri) - 检索HTML
的URL。用于将相对URL解析为绝对URL,
在HTML声明标记之前发生。
- parse(String html) - "As no base URI is specified, absolute URL detection relies on the HTML including a tag."
- parse(String html, String baseUri) - "The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a tag."
我很难理解意思两者之间的区别:
I am having a difficulty understanding the meaning of the difference between the two:
- 在第二个
解析()中
版本,解析绝对网址的相对网址,发生在
之前 HTML声明< base href>
标签是什么意思?如果页面中出现
< base href>
标记从不会怎样? - 绝对URL检测的目的是什么?为什么Jsoup需要
来查找绝对URL? - 最后,但最重要的是:
baseUri
完整的URL HTML页面
(如原始文档中所述)或HTML页面的
基础网址?
- In the 2nd
parse()
version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a<base href>
tag" mean? What if a<base href>
tag never occurs in the page? - What is the purpose of absolute URL detection? Why does Jsoup need to find the absolute URL?
- Lastly, but most importantly: Is
baseUri
the full URL of HTML page (as phrased in original documentation) or is it the base URL of the HTML page?
推荐答案
它用于其他 元素#absUrl()
,以便您可以检索<的预期绝对URL code>< a href> ,< img src>
,< link href>
,< script src>
等。例如
It's used for among others Element#absUrl()
so that you can retrieve the (intended) absolute URL of an <a href>
, <img src>
, <link href>
, <script src>
, etc. E.g.
for (Element link : document.select("a")) {
System.out.println(link.absUrl("href"));
}
如果要下载和/或解析链接的资源,这非常有用还有。
This is very useful if you want to download and/or parse the linked resources as well.
在第二个parse()版本中,是什么将相对URL解析为绝对URL,这是在HTML声明
< base href>
标记之前发生的?如果页面中永远不会出现< base href>
标记怎么办?
In the 2nd parse() version, what does "resolve relative URLs to absolute URLs, that occur before the HTML declares a
<base href>
tag" mean? What if a<base href>
tag never occurs in the page?
某些(差)网站可能已宣布< link>
或< script>
在 < base>
标记之前的相对网址。或者,如果没有< base>
标记的方法,那么只有给定的 baseUri
将用于解析整个文档的相对URL。
Some (poor) websites may have declared a <link>
or <script>
with a relative URL before the <base>
tag. Or if there is no means of a <base>
tag, then just the given baseUri
will be used for resolving relative URLs of the entire document.
绝对URL的目的是什么检测?为什么Jsoup需要找到绝对URL?
为了在元素#absUrl()
。这纯粹是为了最终用户的便利。 Jsoup不需要它来成功解析HTML。
In order to return the right URL on Element#absUrl()
. This is purely for enduser's convenience. Jsoup doesn't need it in order to successfully parse the HTML at its own.
最后,但最重要的是:baseUri是HTML网页的完整网址(如原始文档中所述)还是HTML网页的基本网址?
前者。如果是后者,那么文档就会撒谎。 baseUri
不得与< base href>
混淆。
The former. If the latter, then documentation would be lying. The baseUri
must not to be confused with <base href>
.
这篇关于Jsoup.parse()与Jsoup.parse() - 或者如何在Jsoup中使用URL检测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!