如何使用 Jsoup 通过 HTTPS 连接? [英] How to connect via HTTPS using Jsoup?

查看:35
本文介绍了如何使用 Jsoup 通过 HTTPS 连接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

它在 HTTP 上运行良好,但是当我尝试使用 HTTPS 源时,它会引发以下异常:

It's working fine over HTTP, but when I try and use an HTTPS source it throws the following exception:

10-12 13:22:11.169: WARN/System.err(332): javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:477)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:328)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpConnection.setupSecureSocket(HttpConnection.java:185)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeSslConnection(HttpsURLConnectionImpl.java:433)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeConnection(HttpsURLConnectionImpl.java:378)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:205)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:152)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:377)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)

相关代码如下:

try {
    doc = Jsoup.connect("https url here").get();
} catch (IOException e) {
    Log.e("sys","coudnt get the html");
    e.printStackTrace();
}

推荐答案

如果你想以正确的方式去做,和/或你只需要处理一个站点,那么你基本上需要获取该站点的 SSL 证书有问题的网站并将其导入到您的 Java 密钥库中.这将产生一个 JKS 文件,在使用 Jsoup(或 java.net.URLConnection)之前,您又将其设置为 SSL 信任存储.

If you want to do it the right way, and/or you need to deal with only one site, then you basically need to grab the SSL certificate of the website in question and import it in your Java key store. This will result in a JKS file which you in turn set as SSL trust store before using Jsoup (or java.net.URLConnection).

您可以从网络浏览器的商店中获取证书.假设您使用的是 Firefox.

You can grab the certificate from your webbrowser's store. Let's assume that you're using Firefox.

  1. 使用 Firefox 访问相关网站,在您的情况下为 https://web2.uconn.edu/driver/old/timepoints.php?stopid=10
  2. 在地址栏左侧,您会看到蓝色的uconn.edu"(这表示 SSL 证书有效)
  3. 点击它了解详细信息,然后点击更多信息按钮.
  4. 在出现的安全对话框中,点击查看证书按钮.
  5. 在出现的证书面板中,转到详细信息标签.
  6. 单击证书层次结构中最深的项目,在本例中为web2.uconn.edu",最后单击导出按钮.
  1. Go to the website in question using Firefox, which is in your case https://web2.uconn.edu/driver/old/timepoints.php?stopid=10
  2. Left in the address bar you'll see "uconn.edu" in blue (this indicates a valid SSL certificate)
  3. Click on it for details and then click on the More information button.
  4. In the security dialogue which appears, click the View Certificate button.
  5. In the certificate panel which appears, go to the Details tab.
  6. Click the deepest item of the certificate hierarchy, which is in this case "web2.uconn.edu" and finally click the Export button.

现在你有一个 web2.uconn.edu.crt 文件.

Now you've a web2.uconn.edu.crt file.

接下来,打开命令提示符并使用 keytool 命令(它是 JRE 的一部分)将其导入 Java 密钥库:

Next, open the command prompt and import it in the Java key store using the keytool command (it's part of the JRE):

keytool -import -v -file /path/to/web2.uconn.edu.crt -keystore /path/to/web2.uconn.edu.jks -storepass drowssap

-file 必须指向您刚刚下载的 .crt 文件的位置.-keystore 必须指向生成的 .jks 文件的位置(您又希望将其设置为 SSL 信任库).-storepass 是必需的,你可以输入任何你想要的密码,只要它至少有 6 个字符.

The -file must point to the location of the .crt file which you just downloaded. The -keystore must point to the location of the generated .jks file (which you in turn want to set as SSL trust store). The -storepass is required, you can just enter whatever password you want as long as it's at least 6 characters.

现在,您有一个 web2.uconn.edu.jks 文件.您最终可以在连接之前将其设置为 SSL 信任存储,如下所示:

Now, you've a web2.uconn.edu.jks file. You can finally set it as SSL trust store before connecting as follows:

System.setProperty("javax.net.ssl.trustStore", "/path/to/web2.uconn.edu.jks");
Document document = Jsoup.connect("https://web2.uconn.edu/driver/old/timepoints.php?stopid=10").get();
// ...

<小时>

作为一种完全不同的选择,特别是当您需要处理多个站点时(即您正在创建一个万维网爬虫),那么您还可以指示 Jsoup(基本上,java.net.URLConnection) 盲目信任所有 SSL 证书.另请参阅此答案最底部的处理不受信任或配置错误的 HTTPS 站点"部分:使用 java.net.URLConnection 触发和处理 HTTP 请求


As a completely different alternative, particularly when you need to deal with multiple sites (i.e. you're creating a world wide web crawler), then you can also instruct Jsoup (basically, java.net.URLConnection) to blindly trust all SSL certificates. See also section "Dealing with untrusted or misconfigured HTTPS sites" at the very bottom of this answer: Using java.net.URLConnection to fire and handle HTTP requests

这篇关于如何使用 Jsoup 通过 HTTPS 连接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆