Cloudflare和Chromedriver-Cloudflare可以区分chromedriver和正版chrome吗? [英] Cloudflare and Chromedriver - cloudflare distinguishes between chromedriver and genuine chrome?

查看:127
本文介绍了Cloudflare和Chromedriver-Cloudflare可以区分chromedriver和正版chrome吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用chromedriver从fanfiction.net抓取一些故事.我尝试以下方法:

I would like to use chromedriver to scrape some stories from fanfiction.net. I try the following:

from selenium import webdriver
import time

path = 'D:\chromedriver\chromedriver.exe'

browser = webdriver.Chrome(path)
url1 = 'https://www.fanfiction.net/s/8832472'
url2 = 'https://www.fanfiction.net/s/5218118'

browser.get(url1)
time.sleep(5)
browser.get(url2)

第一个链接打开(有时我必须等待5秒钟).当我想加载第二个URL时,cloudflare介入并且要我解决验证码-无法解决的问题,至少cloudflare无法识别这一点.如果我在chromedriver中手动输入链接(在GUI中),也会发生这种情况.但是,如果我在普通的chrome浏览器中做同样的事情,那么一切都一样好(我什至没有获得第一个链接的等待时间)-即使是在私有模式下,并且删除了所有cookie.我可以在几台机器上复制它.现在我的问题是:凭直觉,chromedriver只是允许被控制的普通chrome浏览器.与普通chrome有什么区别,Cloudflare如何区分两者,如何将chromedriver屏蔽为普通chrome?(我不打算在很短的时间内加载很多页面,因此它看起来不应该像机器人一样).我希望我的问题很清楚

The first link opens (sometimes I have to wait 5 seconds). When I want to load the second url, cloudflare intervens and wants me to solve captchas - which are not solvable, atleast cloudflare does not recognize this. This happens also, if I enter the links manually in chromedriver (so in the GUI). However, if I do the same things in normal chrome, everything works just as fine (I do not even get the waiting period on the first link) - even in private mode and all cookies deleted. I could reproduce this on several machines. Now my question: To my intuition, chromedriver was just the normal chrome browser which allowed to be controlled. What is the difference to normal chrome, how does Cloudflare distinguish both, and how can I mask my chromedriver as normal chrome? (I do not intend to load many pages in very short time, so it should not look like a bot). I hope my question is clear

推荐答案

此错误消息...

...表示 Cloudflare 已将您对网站的请求检测为自动bot,随后拒绝您访问该应用程序.

...implies that the Cloudflare have detected your requests to the website as an automated bot and subsequently denying you the access to the application.

在这些情况下,潜在的解决方案是使用 undetected-chromedriver 来初始化 Chrome浏览上下文.

In these cases the a potential solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context.

undetected-chromedriver 是经过优化的Selenium Chromedriver补丁,不会触发反机器人服务例如Distill Network/Imperva/DataDome/Botprotect.io.它会自动下载驱动程序二进制文件并对其进行修补.

undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.

  • 代码块:

  • Code Block:

import undetected_chromedriver as uc
from selenium import webdriver
import time

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
driver = uc.Chrome(options=options)
url1 = 'https://www.fanfiction.net/s/8832472'
url2 = 'https://www.fanfiction.net/s/5218118'
driver.get(url1)
time.sleep(5)
driver.get(url2)

您可以在以下位置找到几个相关的详细讨论:

You can find a couple of relevant detailed discussions in:

这篇关于Cloudflare和Chromedriver-Cloudflare可以区分chromedriver和正版chrome吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆