Scrapy response.follow 查询 [英] Scrapy response.follow query
问题描述
我按照此页面的说明操作 http://docs.scrapy.org/en/latest/intro/tutorial.html
I followed the instructions from this page http://docs.scrapy.org/en/latest/intro/tutorial.html
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/page/1/',
]
def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').get(),
'author': quote.css('span small::text').get(),
'tags': quote.css('div.tags a.tag::text').getall(),
}
next_page = response.css('li.next a::attr(href)').get()
if next_page is not None:
yield response.follow(next_page, callback=self.parse)
上面的例子适用于他们的页面
The above example works for their pages
<ul class="pager">
<li class="next">
<a href="/page/2/">Next <span aria-hidden="true">→/span></a>
</li>
</ul>
我现在想更改 response.follow 以搜索包含此格式链接的页面
I now want to change the response.follow to search a page which contains some links in this format
Page 1
<div class="pages-list">
<ul class="page">
<li class="page-current">1</li>
<li class="page-item"><a title="Page 2" href="/url2">2</a></li>
<li class="page-item"><a title="Page 3" href="/url3">3</a></li>
Page 2 and so on
<div class="pages-list">
<ul class="page">
<li class="page-item"><a title="Page 1" href="/url1">1</a></li>
<li class="page-current">2</li>
<li class="page-item"><a title="Page 3" href="/url3">3</a></li>
并尝试了不同的变体以从第一页开始下一页
and tried different variations to get the next page starting from the first page
我看不出任何错误,但我的代码只检查第一页然后停止
I cannot see anything wrong but my code only checks the first page and then stops
next_page = response.css('li.page-current a::attr(href)').get()
或
next_page = response.css('li.page-current li a::attr(href)').get()
两者都不行,请指教,在第1页之后,将要检查第2页,然后是第3页,依此类推
Both don't work, please advise, after page 1, will want to check page 2, then page 3, etc.
推荐答案
使用 XPath 非常简单:
Pretty easy with XPath:
next_page = response.xpath('//li[@class="page-current"]/following-sibling::li[1]/a/@href').get()
这篇关于Scrapy response.follow 查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!