如何绕过cloudflare bot / ddos​​保护在Scrapy? [英] How to bypass cloudflare bot/ddos protection in Scrapy?

查看:1899
本文介绍了如何绕过cloudflare bot / ddos​​保护在Scrapy?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我偶尔会刮电子商务网页,以取得产品价格资讯。我有一段时间没有使用使用 Scrapy 构建的刮刀,而且昨天正试图使用​​它 - 我遇到了一个机器人保护的问题。



它使用CloudFlare的DDOS保护,它基本上是使用JavaScript评估过滤掉禁用JS的浏览器(因此筛选器)。一旦评估该函数,就会生成带有计算数字的响应。作为回报,服务发送回附加到​​每个请求的两个认证cookie允许正常地抓取站点。 这里

我还发现了一个 cloudflare-scrape 使用外部JS评估引擎计算数字并将请求发送回服务器的Python模块。我不知道如何将其整合到 Scrapy 中。或者也许有一个更聪明的方式,而不使用JS执行?最后,它是一个表单...



我会帮助任何帮助。

解决方案

因此,我在 cloudflare-scrape 帮助下使用Python执行JavaScript。 / p>

向您的刮刀,您需要添加以下代码:

  def start_requests(self):
cf_requests = []
for self.start_urls:
token,agent = cfscrape.get_tokens(url,'your prefarable user agent,_optional_')
cf_requests.append(Request(url = url,
cookies = {'__ cfduid':token ['__ cfduid']},
headers = {'User-Agent':agent}))
return cf_requests

。就是这样!



当然,你需要首先安装cloudflare-scrape并将它导入你的蜘蛛。您还需要安装一个JS执行引擎。我已经有Node.JS,没有投诉。


I used to scrape e-commerce webpage occasionally to get product prices information. I have not used the scraper built using Scrapy in a while and yesterday was trying to use it - I run into a problem with bot protection.

It is using CloudFlare’s DDOS protection which is basically using JavaScript evaluation to filter out the browsers (and therefore scrapers) with JS disabled. Once the function is evaluated, the response with calculated number is generated. In return, service sends back two authentication cookies which attached to each request allow to normally crawl the site. Here's the description of how it works.

I have also found a cloudflare-scrape Python module that uses external JS evaluation engine to calculate the number and send the request back to server. I'm not sure how to integrate it into Scrapy though. Or maybe there's a smarter way without using JS execution? In the end, it's a form...

I'd apriciate any help.

解决方案

So I executed JavaScript using Python with help of cloudflare-scrape.

To your scraper, you need to add the following code:

  def start_requests(self):
    cf_requests = []
    for url in self.start_urls:
      token, agent = cfscrape.get_tokens(url, 'Your prefarable user agent, _optional_')
      cf_requests.append(Request(url=url,
                      cookies={'__cfduid': token['__cfduid']},
                      headers={'User-Agent': agent}))
    return cf_requests

alongside parsing functions. And that's it!

Of course, you need to install cloudflare-scrape first and import it to your spider. You also need a JS execution engine installed. I had Node.JS already, no complaints.

这篇关于如何绕过cloudflare bot / ddos​​保护在Scrapy?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆