Scrapy linkextractor 忽略符号 # 后面的参数，因此不会跟随链接 [英] Scrapy linkextractor ignores parameters behind the sign # and thus will not follow the link

查看：35 发布时间：2021/7/17 18:36:04 scrapy

本文介绍了Scrapy linkextractor 忽略符号 # 后面的参数，因此不会跟随链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用scrapy抓取一个网站，其中分页位于符号#"后面.这以某种方式使scrapy 忽略该字符后面的所有内容，并且它始终只会看到第一页.

I am trying to crawl a website with scrapy where the pagination is behind the sign "#". This somehow makes scrapy ignore everything behind that character and it will always only see the first page.

例如:

http://www.rolex.de/de/watches/find-rolex.html#g=1&p=2

如果您手动输入问号，网站将加载第 1 页

If you enter a question mark manually, the site will load page 1

http://www.rolex.de/de/watches/find-rolex.html?p=2

scrapy 的统计数据告诉我它获取了第一页:

The stats from scrapy tell me it fetched the first page:

DEBUG: Crawled (200) http://www.rolex.de/de/watches/datejust/m126334-0014.html>(参考:http://www.rolex.de/de/watches/find-rolex.html)

DEBUG: Crawled (200) http://www.rolex.de/de/watches/datejust/m126334-0014.html> (referer: http://www.rolex.de/de/watches/find-rolex.html)

我的爬虫看起来像这样:

My crawler looks like this:

start_urls = [
    'http://www.rolex.de/de/watches/find-rolex.html#g=1',
    'http://www.rolex.de/de/watches/find-rolex.html#g=0&p=2',
    'http://www.rolex.de/de/watches/find-rolex.html#g=0&p=3',
]

rules = (
    Rule(
        LinkExtractor(allow=['.*/de/watches/.*/m\d{3,}.*.\.html']), 
        callback='parse_item'
    ),       
    Rule(
        LinkExtractor(allow=['.*/de/watches/find-rolex(/.*)?\.html#g=1(&p=\d*)?$']), 
        follow=True
    ),
)

如何让scrapy 忽略url 中的# 并访问给定的URL?

How can I make scrapy ignore the # inside the url and visit the given URL?

Scrapy linkextractor 忽略符号 # 后面的参数，因此不会跟随链接 [英] Scrapy linkextractor ignores parameters behind the sign # and thus will not follow the link

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Scrapy linkextractor 忽略符号 # 后面的参数，因此不会跟随链接 [英] Scrapy linkextractor ignores parameters behind the sign # and thus will not follow the link

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭