在Selenium中使用HttpURLConnection时如何修复403响应,因为链接是手动打开的,没有任何问题 [英] How to fix 403 response when using HttpURLConnection in Selenium since the links are opening manually without any issue

查看:154
本文介绍了在Selenium中使用HttpURLConnection时如何修复403响应,因为链接是手动打开的,没有任何问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用硒Web驱动程序和Java检查网站中的活动链接.我已将链接传递给数组,并在验证时得到的响应是站点中所有链接的403禁止访问.它只是任何人都可以访问的公共网站.手动单击时,链接正常工作.我想知道为什么它没有显示200,在这种情况下可以做什么.

I was checking the active links in a website with selenium web driver and java. I have passed the links to the array and while verifying I am getting the response as 403 forbidden for all links in the site. It is just a public website anyone can access. The links are working properly when clicking manually. I wanted to know Why it is not showing 200 and what can be done on this situation.

这是用于带有Java的Selenium Webdriver

This is for Selenium webdriver with Java

for(int j=0;j< activelinks.size();j++) {
        System.out.println("Active Link address and status >>> " +  activelinks.get(j).getAttribute("href"));
        HttpURLConnection connection = (HttpURLConnection)new URL(activelinks.get(j).getAttribute("href")).openConnection();
        connection.connect();
        String response = connection.getResponseMessage();
        int responsecode = connection.getResponseCode();
        connection.disconnect();
        System.out.println(activelinks.get(j).getAttribute("href")+ ">>"+ response+ " " + responsecode);}

我希望响应代码为200,但实际输出为403

I expect the response code as 200, but the actual output is 403

推荐答案

403禁止

HTTP > 403 Forbidden > 客户端错误状态响应代码表示服务器可以理解该请求,但拒绝对其进行授权.

403 Forbidden

The HTTP 403 Forbidden client error status response code indicates that the server understood the request but refuses to authorize it.

此状态类似于 401 ,但是在这种情况下,重新验证不会有任何区别.永久禁止访问并将访问与应用程序逻辑绑定在一起,例如对资源的权限不足.

This status is similar to 401, but in this case, re-authenticating will make no difference. The access is permanently forbidden and tied to the application logic, such as insufficient rights to a resource.

我在您的代码块中没有看到任何此类问题.但是,有可能检测到 WebDriver 控制的 Browser Client ,因此随后的请求被阻止,并且可能有许多因素如下:

I don't see any such issue in your code block. However, there is a possibility that the WebDriver controlled Browser Client is getting detected and hence the subsequent requests are getting blocked and there can be numerous factors as follows:

  • User agent
  • Plugins
  • Languages
  • WebGL
  • Browser features
  • Missing image
  • User agent
  • Plugins
  • Languages
  • WebGL
  • Browser features
  • Missing image

您可以在以下位置找到一些详细的讨论:

You can find a couple of detailed discussion in:

  • How does recaptcha 3 know I'm using selenium/chromedriver?
  • Selenium and non-headless browser keeps asking for Captcha

通用解决方案是使用代理或旋转代理 >免费代理列表.

A generic solution will be to use a proxy or rotating proxies from the Free Proxy List.

您可以在更改chromedriver中的代理服务器中找到详细的讨论用于刮擦


Outro

您可以在以下位置进行一些相关的讨论:


Outro

You can a couple relevant discussions in:

  • Can a website detect when you are using selenium with chromedriver?
  • Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
  • Failed to load resource: the server responded with a status of 429 (Too Many Requests) and 404 (Not Found) with ChromeDriver Chrome through Selenium

这篇关于在Selenium中使用HttpURLConnection时如何修复403响应,因为链接是手动打开的,没有任何问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆