Scrapy response.follow 查询 [英] Scrapy response.follow query

查看：24 发布时间：2021/7/9 18:50:22 scrapy response

本文介绍了Scrapy response.follow 查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我按照此页面的说明操作 http://docs.scrapy.org/en/latest/intro/tutorial.html

I followed the instructions from this page http://docs.scrapy.org/en/latest/intro/tutorial.html

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').get(),
                'author': quote.css('span small::text').get(),
                'tags': quote.css('div.tags a.tag::text').getall(),
            }

        next_page = response.css('li.next a::attr(href)').get()
        if next_page is not None:
            yield response.follow(next_page, callback=self.parse)

上面的例子适用于他们的页面

The above example works for their pages

<ul class="pager">
<li class="next">
<a href="/page/2/">Next <span aria-hidden="true">&rarr;/span></a>
</li>             
</ul>

我现在想更改 response.follow 以搜索包含此格式链接的页面

I now want to change the response.follow to search a page which contains some links in this format

Page 1
<div class="pages-list">
<ul class="page">
<li class="page-current">1</li>
<li class="page-item"><a title="Page 2" href="/url2">2</a></li>
<li class="page-item"><a title="Page 3" href="/url3">3</a></li>

Page 2 and so on
<div class="pages-list">
<ul class="page">
<li class="page-item"><a title="Page 1" href="/url1">1</a></li>
<li class="page-current">2</li>
<li class="page-item"><a title="Page 3" href="/url3">3</a></li>

并尝试了不同的变体以从第一页开始下一页

and tried different variations to get the next page starting from the first page

我看不出任何错误，但我的代码只检查第一页然后停止

I cannot see anything wrong but my code only checks the first page and then stops

next_page = response.css('li.page-current a::attr(href)').get()

或

next_page = response.css('li.page-current li a::attr(href)').get()

两者都不行，请指教，在第1页之后，将要检查第2页，然后是第3页，依此类推

Both don't work, please advise, after page 1, will want to check page 2, then page 3, etc.

Scrapy response.follow 查询 [英] Scrapy response.follow query

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Scrapy response.follow 查询 [英] Scrapy response.follow query

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭