403禁止使用Java但不禁用Web浏览器？ [英] 403 Forbidden with Java but not web browser?

查看：345 发布时间：2018/11/26 12:19:12 java http-status-code-403

本文介绍了403禁止使用Java但不禁用Web浏览器？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个小型Java程序来获取给定Google搜索字词的结果数量。出于某种原因，在Java中我得到403 Forbidden但我在Web浏览器中得到了正确的结果。代码：

I am writing a small Java program to get the amount of results for a given Google search term. For some reason, in Java I am getting a 403 Forbidden but I am getting the right results in web browsers. Code:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;


public class DataGetter {

    public static void main(String[] args) throws IOException {
        getResultAmount("test");
    }

    private static int getResultAmount(String query) throws IOException {
        BufferedReader r = new BufferedReader(new InputStreamReader(new URL("https://www.google.com/search?q=" + query).openConnection()
                .getInputStream()));
        String line;
        String src = "";
        while ((line = r.readLine()) != null) {
            src += line;
        }
        System.out.println(src);
        return 1;
    }

}

错误：

Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.google.com/search?q=test
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
    at DataGetter.getResultAmount(DataGetter.java:15)
    at DataGetter.main(DataGetter.java:10)

为什么这样做？

推荐答案

你只需要设置用户代理头让它工作：

You just need to set user agent header for it to work:

URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();

BufferedReader r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));

StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
    sb.append(line);
}
System.out.println(sb.toString());

从您的异常堆栈跟踪可以看出，SSL是透明处理的。

The SSL was transparently handled for you as could be seen from your exception stacktrace.

获取结果数量实际上并不是那么简单，在此之后你必须通过获取cookie并解析重定向令牌链接来假装你是浏览器。

Getting the result amount is not really this simple though, after this you have to fake that you're a browser by fetching the cookie and parsing the redirect token link.

String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
Matcher m = pattern.matcher(response);
if( m.find() ) {
    String url = m.group(1);
    connection = new URL(url).openConnection();
    connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
    connection.setRequestProperty("Cookie", cookie );
    connection.connect();
    r  = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
    sb = new StringBuilder();
    while ((line = r.readLine()) != null) {
        sb.append(line);
    }
    response = sb.toString();
    pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
    m = pattern.matcher(response);
    if( m.find() ) {
        long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
        return amount;
    }

}

运行完整代码我得到了 2930000000L 。

这篇关于403禁止使用Java但不禁用Web浏览器？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

403禁止使用Java但不禁用Web浏览器？ [英] 403 Forbidden with Java but not web browser?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

403禁止使用Java但不禁用Web浏览器？ [英] 403 Forbidden with Java but not web browser?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭