使用IP和主机与Jsoup [英] Use IP and host with Jsoup

查看:222
本文介绍了使用IP和主机与Jsoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用ip和主机访问网页,以便通过存储域的值来保存DNS查找时间。如果通过套接字,则可以使用套接字传输以下语法的GET请求:

I would like to access a webpage using ip and host, in order to save DNS lookup times by having stored values for domains. If via sockets, it'd be done by using sockets, transmitting a GET request of the following syntax:

Socket s = new Socket([string_ip_address], 80);

然后传输:
获取[file_name] HTTP / 1.1\r\\\

主机:[some_name]

Then transmitting: Get [file_name] HTTP/1.1\r\n Host: [some_name]

但是我想使用Jsoup。
在Jsoup中查找页面的等效命令如下:

But I would like to use Jsoup. The equivalent command to retrieve a page, saw www.google.com, in Jsoup is:

Jsoup.connect("http://www.google.com").get();

但提供的网站名称必须是实际名称,而不是IP(因为,如果我有限的理解是正确,许多域可以驻留在同一个ip地址)。所以,我想我可能尝试改变Jsoup的请求,包括站点名称和ip。由于Jsoup在其底层代码中使用HttpUrlConnection(这里是Jsoup库本身的代码废稿,如下所示: https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/helper/HttpConnection.java ):

But the provided site name must be the actual name, not IP (because, if my limited understanding is correct, many domains can reside in the same ip address). So, I figured I might try and alter the request made by Jsoup, to include both site name and ip. Since Jsoup uses HttpUrlConnection in it's underlying code (here's a code scrap from the Jsoup library itself, as found here: https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/helper/HttpConnection.java):

HttpURLConnection conn = (HttpURLConnection) req.url().openConnection();

conn.setRequestMethod(req.method().name());
conn.setInstanceFollowRedirects(false);
conn.setConnectTimeout(req.timeout());
conn.setReadTimeout(req.timeout());

if (conn instanceof HttpsURLConnection) {
    if (!req.validateTLSCertificates()) {
         initUnSecureTSL();
         ((HttpsURLConnection)conn).setSSLSocketFactory(sslSocketFactory);
         ((HttpsURLConnection)conn).setHostnameVerifier(getInsecureVerifier());
    }
}

if (req.method().hasBody())
    conn.setDoOutput(true);
if (req.cookies().size() > 0)
    conn.addRequestProperty("Cookie", getRequestCookieString(req));
for (Map.Entry<String, String> header : req.headers().entrySet()) {
    conn.addRequestProperty(header.getKey(), header.getValue());
}

我以为写这样的东西:

Jsoup.connect(ip).header("Host", host);

但这似乎不起作用。
那么,是否有一种已知的方法可以在Jsoup请求中使用ip +主机(以备用DNS查找),还是使用其他方法来跳过使用Jsoup的DNS查找?

But this doesn't seem to work. So, is there a known way to use ip + host in Jsoup requests (to spare DNS lookups), or is there some other way to skip the DNS lookup using Jsoup?

谢谢!

编辑 -

要清楚:
使用带有IP和主机名的套接字 - 工作。例如,尝试以以下方式通过IP获取buzzfeed的主页面:

Just to be clear: Using sockets with IP and host name - works. For example, trying to fetch the main page of buzzfeed via IP in the following way:

Socket s = new Socket("23.34.229.118", 80);
BufferedReader reader = new BufferedReader(new InputStreamReader(s.getInputStream()));
PrintStream writer = new PrintStream(s.getOutputStream());
writer.println("GET / HTTP/1.0\r\nHost: www.buzzfeed.com\r\n");

String line;
while((line = reader.readLine()) != null)
{
    System.out.println(line);
}

s.close();

工作完全正常但是我无法通过

Works perfectly fine. But I am unable to access the page via

Jsoup.connect("http://23.34.229.118");

我确信这是因为我需要指定主机,如果这是可能的话。我尝试使用

And I am quite sure that's because I need to specify the host somehow, if that's even possible. My attempt with

Jsoup.connect("http://23.34.229.118").header("Host", "buzzfeed.com"); 

失败,我有一个400错误。

failed and I got a 400 error.

推荐答案

我相信我找到了解决方案。

I believe I have found the solution.

需要将以下行添加到代码中 -

The following line needs to be added to the code -

System.setProperty("sun.net.http.allowRestrictedHeaders", "true");

这与这个问题密切相关,因为Jsoup的实现使用HttpURLConnection:
< a href =https://stackoverflow.com/questions/7648872/can-i-override-the-host-header-where-using-javas-httpurlconnection-class>我可以覆盖使用java的HttpUrlConnection类的主机头?

This is closely related to this question, since the implementation of Jsoup uses HttpURLConnection: Can I override the Host header where using java's HttpUrlConnection class?

显然,java简单地阻止(默认情况下)更改某些标题的能力,其中一个是主机头。

Apparently, java simply blocks (by default) the ability to change some headers, one of which is the host header.

这篇关于使用IP和主机与Jsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆