使用IP和主机与Jsoup [英] Use IP and host with Jsoup
问题描述
我想使用ip和主机访问网页,以便通过存储域的值来保存DNS查找时间。如果通过套接字,则可以使用套接字传输以下语法的GET请求:
I would like to access a webpage using ip and host, in order to save DNS lookup times by having stored values for domains. If via sockets, it'd be done by using sockets, transmitting a GET request of the following syntax:
Socket s = new Socket([string_ip_address], 80);
然后传输:
获取[file_name] HTTP / 1.1\r\\\
主机:[some_name]
Then transmitting: Get [file_name] HTTP/1.1\r\n Host: [some_name]
但是我想使用Jsoup。
在Jsoup中查找页面的等效命令如下:
But I would like to use Jsoup. The equivalent command to retrieve a page, saw www.google.com, in Jsoup is:
Jsoup.connect("http://www.google.com").get();
但提供的网站名称必须是实际名称,而不是IP(因为,如果我有限的理解是正确,许多域可以驻留在同一个ip地址)。所以,我想我可能尝试改变Jsoup的请求,包括站点名称和ip。由于Jsoup在其底层代码中使用HttpUrlConnection(这里是Jsoup库本身的代码废稿,如下所示: https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/helper/HttpConnection.java ):
But the provided site name must be the actual name, not IP (because, if my limited understanding is correct, many domains can reside in the same ip address). So, I figured I might try and alter the request made by Jsoup, to include both site name and ip. Since Jsoup uses HttpUrlConnection in it's underlying code (here's a code scrap from the Jsoup library itself, as found here: https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/helper/HttpConnection.java):
HttpURLConnection conn = (HttpURLConnection) req.url().openConnection();
conn.setRequestMethod(req.method().name());
conn.setInstanceFollowRedirects(false);
conn.setConnectTimeout(req.timeout());
conn.setReadTimeout(req.timeout());
if (conn instanceof HttpsURLConnection) {
if (!req.validateTLSCertificates()) {
initUnSecureTSL();
((HttpsURLConnection)conn).setSSLSocketFactory(sslSocketFactory);
((HttpsURLConnection)conn).setHostnameVerifier(getInsecureVerifier());
}
}
if (req.method().hasBody())
conn.setDoOutput(true);
if (req.cookies().size() > 0)
conn.addRequestProperty("Cookie", getRequestCookieString(req));
for (Map.Entry<String, String> header : req.headers().entrySet()) {
conn.addRequestProperty(header.getKey(), header.getValue());
}
我以为写这样的东西:
Jsoup.connect(ip).header("Host", host);
但这似乎不起作用。
那么,是否有一种已知的方法可以在Jsoup请求中使用ip +主机(以备用DNS查找),还是使用其他方法来跳过使用Jsoup的DNS查找?
But this doesn't seem to work. So, is there a known way to use ip + host in Jsoup requests (to spare DNS lookups), or is there some other way to skip the DNS lookup using Jsoup?
谢谢!
编辑 -
要清楚:
使用带有IP和主机名的套接字 - 工作。例如,尝试以以下方式通过IP获取buzzfeed的主页面:
Just to be clear: Using sockets with IP and host name - works. For example, trying to fetch the main page of buzzfeed via IP in the following way:
Socket s = new Socket("23.34.229.118", 80);
BufferedReader reader = new BufferedReader(new InputStreamReader(s.getInputStream()));
PrintStream writer = new PrintStream(s.getOutputStream());
writer.println("GET / HTTP/1.0\r\nHost: www.buzzfeed.com\r\n");
String line;
while((line = reader.readLine()) != null)
{
System.out.println(line);
}
s.close();
工作完全正常但是我无法通过
Works perfectly fine. But I am unable to access the page via
Jsoup.connect("http://23.34.229.118");
我确信这是因为我需要指定主机,如果这是可能的话。我尝试使用
And I am quite sure that's because I need to specify the host somehow, if that's even possible. My attempt with
Jsoup.connect("http://23.34.229.118").header("Host", "buzzfeed.com");
失败,我有一个400错误。
failed and I got a 400 error.
推荐答案
我相信我找到了解决方案。
I believe I have found the solution.
需要将以下行添加到代码中 -
The following line needs to be added to the code -
System.setProperty("sun.net.http.allowRestrictedHeaders", "true");
这与这个问题密切相关,因为Jsoup的实现使用HttpURLConnection:
< a href =https://stackoverflow.com/questions/7648872/can-i-override-the-host-header-where-using-javas-httpurlconnection-class>我可以覆盖使用java的HttpUrlConnection类的主机头?
This is closely related to this question, since the implementation of Jsoup uses HttpURLConnection: Can I override the Host header where using java's HttpUrlConnection class?
显然,java简单地阻止(默认情况下)更改某些标题的能力,其中一个是主机头。
Apparently, java simply blocks (by default) the ability to change some headers, one of which is the host header.
这篇关于使用IP和主机与Jsoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!