如何在 selenium chromedriver python 中设置具有身份验证的代理? [英] how to set proxy with authentication in selenium chromedriver python?

查看:48
本文介绍了如何在 selenium chromedriver python 中设置具有身份验证的代理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个脚本来抓取一个网站以收集一些数据,但问题是他们在太多请求后阻止了我,但使用代理我可以发送比目前更多的请求.我已将代理与 chrome 选项 --proxy-server

I am creating a script that crawls one website to gather some data but the problem is that they blocked me after too many requests but using a proxy I can send more request then currently I do. I have integrated proxy with chrome option --proxy-server

options.add_argument('--proxy-server={}'.format('http://ip:port'))

但我使用的是付费代理,因此它需要身份验证,如下图所示,它提供了用户名和密码的警报框

but I am using a paid proxy so it requires authentication and as below screenshot it gives the alert box for username and password

然后我尝试用用户名和密码来使用它

Then I tried to use it with username and password

options.add_argument('--proxy-server={}'.format('http://username:password@ip:port'))

但它似乎也不起作用.我一直在寻找解决方案,并在下面找到了解决方案,并将其与 chrome 扩展 proxy auto auth 并且没有 chrome 扩展

But it also does not seems to work. I was looking for a solution and found below solution and I used it with the chrome extension proxy auto auth and without the chrome extension

proxy = {'address': settings.PROXY,
             'username': settings.PROXY_USER,
             'password': settings.PROXY_PASSWORD}

capabilities = dict(DesiredCapabilities.CHROME)
capabilities['proxy'] = {'proxyType': 'MANUAL',
                             'httpProxy': proxy['address'],
                             'ftpProxy': proxy['address'],
                             'sslProxy': proxy['address'],
                             'noProxy': '',
                             'class': "org.openqa.selenium.Proxy",
                             'autodetect': False,
                             'socksUsername': proxy['username'],
                             'socksPassword': proxy['password']}
options.add_extension(os.path.join(settings.DIR, "extension_2_0.crx")) # proxy auth extension

但以上都没有正常工作,它似乎工作,因为在上面的代码之后代理身份验证警报消失了,当我通过谷歌搜索我的 IP 并确认它不起作用时.

but neither of above worked properly it seems working because after above code the proxy authentication alert disappeared and when I checked my IP by googling what is my IP and confirmed that is not working.

请任何可以帮助我在 chromedriver 上验证代理服务器的人.

please anyone who can help me to authenticate the proxy server on chromedriver.

推荐答案

Selenium Chrome 代理身份验证

使用 Python 使用 Selenium 设置 chromedriver 代理

如果您需要使用带有 python 的代理和带有 chromedriver 的 Selenium 库,您通常使用以下代码(无需任何用户名和密码:

If you need to use a proxy with python and Selenium library with chromedriver you usually use the following code (Without any username and password:

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % hostname + ":" + port)
driver = webdriver.Chrome(chrome_options=chrome_options)

除非代理需要身份验证,否则它可以正常工作.如果代理要求您使用用户名和密码登录,它将不起作用.在这种情况下,您必须使用下面解释的更棘手的解决方案.顺便说一句,如果您将来自代理提供商或服务器的服务器 IP 地址列入白名单,它不应该询问代理凭据.

It works fine unless proxy requires authentication. if the proxy requires you to log in with a username and password it will not work. In this case, you have to use more tricky solution that is explained below. By the way, if you whitelist your server IP address from the proxy provider or server it should not ask proxy credentials.

在 Selenium 中使用 Chromedriver 进行 HTTP 代理身份验证

要设置代理身份验证,我们将生成一个特殊文件并使用以下代码将其动态上传到 chromedriver.此代码使用 chromedriver 配置 selenium 以使用需要使用用户/密码对进行身份验证的 HTTP 代理.

To set up proxy authentication we will generate a special file and upload it to chromedriver dynamically using the following code below. This code configures selenium with chromedriver to use HTTP proxy that requires authentication with user/password pair.

import os
import zipfile

from selenium import webdriver

PROXY_HOST = '192.168.3.2'  # rotating proxy or host
PROXY_PORT = 8080 # port
PROXY_USER = 'proxy-user' # username
PROXY_PASS = 'proxy-password' # password


manifest_json = """
{
    "version": "1.0.0",
    "manifest_version": 2,
    "name": "Chrome Proxy",
    "permissions": [
        "proxy",
        "tabs",
        "unlimitedStorage",
        "storage",
        "<all_urls>",
        "webRequest",
        "webRequestBlocking"
    ],
    "background": {
        "scripts": ["background.js"]
    },
    "minimum_chrome_version":"22.0.0"
}
"""

background_js = """
var config = {
        mode: "fixed_servers",
        rules: {
        singleProxy: {
            scheme: "http",
            host: "%s",
            port: parseInt(%s)
        },
        bypassList: ["localhost"]
        }
    };

chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

function callbackFn(details) {
    return {
        authCredentials: {
            username: "%s",
            password: "%s"
        }
    };
}

chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)


def get_chromedriver(use_proxy=False, user_agent=None):
    path = os.path.dirname(os.path.abspath(__file__))
    chrome_options = webdriver.ChromeOptions()
    if use_proxy:
        pluginfile = 'proxy_auth_plugin.zip'

        with zipfile.ZipFile(pluginfile, 'w') as zp:
            zp.writestr("manifest.json", manifest_json)
            zp.writestr("background.js", background_js)
        chrome_options.add_extension(pluginfile)
    if user_agent:
        chrome_options.add_argument('--user-agent=%s' % user_agent)
    driver = webdriver.Chrome(
        os.path.join(path, 'chromedriver'),
        chrome_options=chrome_options)
    return driver

def main():
    driver = get_chromedriver(use_proxy=True)
    #driver.get('https://www.google.com/search?q=my+ip+address')
    driver.get('https://httpbin.org/ip')

if __name__ == '__main__':
    main()

函数 get_chromedriver 返回可在应用程序中使用的已配置 selenium webdriver.这段代码已经过测试并且运行良好.

Function get_chromedriver returns configured selenium webdriver that you can use in your application. This code is tested and works just fine.

详细了解 Chrome 中的 onAuthRequired 事件.

Read more about onAuthRequired event in Chrome.

这篇关于如何在 selenium chromedriver python 中设置具有身份验证的代理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆