使用scrapy时如何绕过'cookiewall'？ [英] How to bypass a 'cookiewall' when using scrapy?

查看：152 发布时间：2020/10/9 4:18:30 python cookies scrapy scrapy-spider

本文介绍了使用scrapy时如何绕过'cookiewall'？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Scrapy的新用户。在遵循了从网站提取数据的教程之后，我试图在论坛上完成一些类似的工作。

I'm a new user to Scrapy. After following the tutorials for extracting data from websites, I am trying to accomplish something similar on forums.

我要提取的是论坛页面上的所有帖子（从头开始）。但是，这个特定的论坛有一个 cookie墙。因此，当我想从 http://forum.fok.nl/topic/2413069 ，每个会话我首先需要点击是的，我接受Cookie按钮。

What I want is to extract all posts on a forum page (to start with). However, this particular forum has a 'cookie wall'. So when I want to extract from http://forum.fok.nl/topic/2413069, each session I first need to click the "Yes, I accept cookies"-button.

我目前最基本的刮板如下：

My very basic scraper currently looks like this:

class FokSpider(scrapy.Spider):
name = 'fok'
allowed_domains = ['forum.fok.nl']
start_urls = ['http://forum.fok.nl/']

def parse(self,response):
    divs = response.xpath("//div").extract()
    yield {'divs': divs}
    pass

我得到的div不是来自实际的论坛线程，而是来自cookie墙。

The divs I get are not from the actual forum thread, but from the cookie wall.

以下是按钮的html：

Here's the html of the button:

<a href="javascript:acceptCookies()" class="button acc CookiesOK" onclick="document.forms['cookies'].submit();acceptCookies();">Ja, Ik wil een goed werkende site...<span class="smaller">...en accepteer de cookies</span></a>

有人可以指出正确的方法绕过Cookie的正确方向（人为地单击按钮）并转到我要抓取的实际网页？（即使正确的Google搜索字词/文档页面等也将非常有用）

Can anyone point me in the right direction on how to bypass this cookiewall (artificially 'click' the button) and go to the actual webpage I'm trying to scrape? (Even the right Google search terms/documentation pages etc would be very helpful)

使用scrapy时如何绕过'cookiewall'？ [英] How to bypass a 'cookiewall' when using scrapy?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用scrapy时如何绕过'cookiewall'？ [英] How to bypass a &#39;cookiewall&#39; when using scrapy?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

使用scrapy时如何绕过'cookiewall'？ [英] How to bypass a 'cookiewall' when using scrapy?

登录关闭