如何绕过 Scrapy 中的 cloudflare bot/ddos 保护? [英] How to bypass cloudflare bot/ddos protection in Scrapy?

查看:74
本文介绍了如何绕过 Scrapy 中的 cloudflare bot/ddos 保护?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我曾经偶尔抓取电子商务网页以获取产品价格信息.我有一段时间没有使用使用 Scrapy 构建的刮板,昨天尝试使用它 - 我遇到了机器人保护问题.

I used to scrape e-commerce webpage occasionally to get product prices information. I have not used the scraper built using Scrapy in a while and yesterday was trying to use it - I run into a problem with bot protection.

它使用 CloudFlare 的 DDOS 保护,基本上是使用 JavaScript 评估来过滤禁用 JS 的浏览器(以及因此抓取工具).一旦函数被评估,就会生成带有计算数字的响应.作为回报,服务发回两个身份验证 cookie,这些 cookie 附加到每个请求允许正常抓取站点.这里 是对其工作原理的描述.

It is using CloudFlare’s DDOS protection which is basically using JavaScript evaluation to filter out the browsers (and therefore scrapers) with JS disabled. Once the function is evaluated, the response with calculated number is generated. In return, service sends back two authentication cookies which attached to each request allow to normally crawl the site. Here's the description of how it works.

我还找到了一个cloudflare-scrape Python 模块,它使用外部 JS 评估引擎来计算号码并将请求发送回服务器.我不确定如何将它集成到 Scrapy 中.或者也许有更聪明的方法而不使用 JS 执行?到头来就是一个表格...

I have also found a cloudflare-scrape Python module that uses external JS evaluation engine to calculate the number and send the request back to server. I'm not sure how to integrate it into Scrapy though. Or maybe there's a smarter way without using JS execution? In the end, it's a form...

我愿意提供任何帮助.

推荐答案

所以我在 cloudflare-scrape.

到你的scraper,你需要添加以下代码:

To your scraper, you need to add the following code:

def start_requests(self):
  for url in self.start_urls:
    token, agent = cfscrape.get_tokens(url, 'Your prefarable user agent, _optional_')
    yield Request(url=url, cookies=token, headers={'User-Agent': agent})

与解析函数一起使用.就是这样!

alongside parsing functions. And that's it!

当然,您需要先安装 cloudflare-scrape 并将其导入您的蜘蛛.您还需要安装一个 JS 执行引擎.我已经有了 Node.JS,没什么可抱怨的.

Of course, you need to install cloudflare-scrape first and import it to your spider. You also need a JS execution engine installed. I had Node.JS already, no complaints.

这篇关于如何绕过 Scrapy 中的 cloudflare bot/ddos 保护?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆