Java如何找出URL是http还是https? [英] Java how to find out if a URL is http or https?

查看:2019
本文介绍了Java如何找出URL是http还是https?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Java编写网络爬虫工具.当我输入网站名称时,如何在不定义协议的情况下使它以http或https连接到该网站?

I am writing a web crawler tool in Java. When I type the website name, how can I make it so that it connects to that site in http or https without me defining the protocol?

try {
   Jsoup.connect("google.com").get();
} catch (IOException ex) {
   Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

但是我得到了错误:

java.lang.IllegalArgumentException: Malformed URL: google.com

我该怎么办?是否有任何类或库可以做到这一点?

What can I do? Are there any classes or libraries that do this?

我想做的是,我有165门课程的列表,每门课程都有65-71个html页面,并在所有页面中都包含链接.我正在编写一个Java程序来测试链接是否断开.

What I'm trying to do is I have a list of 165 Courses, each with 65 - 71 html pages with links all throughout them. I am writing a Java program to test if the link is broken or not.

推荐答案

您可以编写自己的简单方法来尝试两种协议,例如:

You can write your own simple method to try both protocols, like:

static boolean usesHttps(final String urlWithoutProtocol) throws IOException {
    try {
        Jsoup.connect("http://" + urlWithoutProtocol).get();
        return false;
    } catch (final IOException e) {
        Jsoup.connect("https://" + urlWithoutProtocol).get();
        return true;
    }
}

然后,您的原始代码可以是:

Then, your original code can be:

try {
    boolean shouldUseHttps = usesHttps("google.com");
} catch (final IOException ex) {
    Logger.getLogger(LinkGUI.class.getName()).log(Level.SEVERE, null, ex);
}

注意:每个URL仅应使用useHttps()方法一次来确定要使用的协议.知道之后,您应该直接使用Jsoup.connect()进行连接.这样会更有效率.

Note: you should only use the usesHttps() method once per URL, to figure out which protocol to use. After you know that, you should connect using Jsoup.connect() directly. This will be more efficient.

这篇关于Java如何找出URL是http还是https?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆