在 Linux 上使用无头 Chrome 访问被拒绝的页面,而有头 Chrome 在 Windows 上使用 Selenium 通过 Python 工作 [英] Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python

查看:84
本文介绍了在 Linux 上使用无头 Chrome 访问被拒绝的页面,而有头 Chrome 在 Windows 上使用 Selenium 通过 Python 工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我在本地机器上使用的代码:

I have this code that I'm using on my local machine:

from selenium import webdriver
chrom_path = r"C:\Users\user\sof\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrom_path)
link = 'https://www.google.com/'
driver.get(link)
s = driver.page_source
print((s.encode("utf-8")))
driver.quit()

此代码返回本网站的页面源,但是当我在Linux服务器centos7上使用此代码时:

and this code return page source of this website, however when I go on Linux server centos7 and I use this code:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver", chrome_options=options)
driver.get("https://www.google.com")
s = driver.page_source
print((s.encode("utf-8")))
driver.quit()

此代码也应返回页面源,但此代码返回:

this code rode should also return page source but this code returns this:

b'<html><head>\n<title>Access Denied</title>\n</head><body>\n<h1>Access Denied</h1>\n \nYou don\'t have permission to access "http://www.newark.com/" on this server.<p>\nReference #18.456cd417.1576243477.e007b9f\n\n\n</p></body></html>'

有人知道为什么相同的代码在不同的操作系统上的工作方式不同吗?

Does someone have an idea why the same code works differently on different os?

推荐答案

根据您在 Windows 本地机器上的代码试用 non-headless Chromestrong> 在使用 headless Chrome 的 Linux 服务器 centos7 上运行完美,您将被重定向到 拒绝访问 页面.

As per your code trials on your Windows local machine non-headless Chrome works perfecto while on Linux server centos7 using headless Chrome you are redirected to the Access Denied page.

<html><head>\n<title>Access Denied</title>\n</head><body>\n<h1>Access Denied</h1>\n \nYou don\'t have permission to access "http://www.newark.com/" on this server.<p>\nReference #18.456cd417.1576243477.e007b9f\n\n\n</p></body></html>

<小时>

访问被拒绝

根据文章如何要在使用 Headless Chrome 时绕过拒绝访问"页面,Chrome 在无头模式下和在有头模式下运行时略有不同.核心网络堆栈是相同的,浏览器在数据包级别传输请求的方式没有区别,仅将我们指向请求的内容.在检查从无头和有头 Chrome 发出的请求时,观察到 无头 Chrome 正在通过其用户代理 标头 使自己为人所知.headed Chrome 的标题几乎与 Headless 相似.


Access Denied

As per the article How to bypass "Access Denied" pages when using Headless Chrome there is a little difference between Chrome when run in headless and when run in headed mode. The core network stack being the same and there being no differences in how the browser transmits requests at the packet level, points us to the content of the request only. On inspecting the requests made from headless and headed Chrome, it was observed headless Chrome is making itself known through it's User-Agent header. The header for headed Chrome was almost similar minus the Headless.

无头 Chrome 用户代理是:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/71.0.3578.98 Safari/537.36

<小时>

解决方案

所以一个精确的解决方案是设置 从 Chrome 开始.Chrome v79.x 的用户代理是:


Solution

So a precise solution would be to set the user-agent as of headed Chrome. The User-Agent for Chrome v79.x being:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36

你可以修改你的代码如下并执行:

You can modify your code as follows and execute:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument(f'user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36')
driver = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver", chrome_options=options)
driver.get("https://www.google.com")
s = driver.page_source
print((s.encode("utf-8")))
driver.quit()

<小时>

在 Windows 10 操作系统上执行

这篇关于在 Linux 上使用无头 Chrome 访问被拒绝的页面,而有头 Chrome 在 Windows 上使用 Selenium 通过 Python 工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆