HTTP 403使用Python Scrapy时的响应 [英] HTTP 403 Responses when using Python Scrapy

查看：432 发布时间：2018/7/9 16:22:06 python http scrapy

本文介绍了HTTP 403使用Python Scrapy时的响应的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Windows Vista 64位上使用Python.org版本2.7 64位。我一直在测试以下Scrapy代码以递归方式刮取网站www.whoscored.com上的所有页面，这是用于足球统计：

I am using Python.org version 2.7 64 bit on Windows Vista 64 bit. I have been testing the following Scrapy code to recursively scrape all the pages at the site www.whoscored.com, which is for football statistics:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import Item
from scrapy.spider import BaseSpider
from scrapy import log
from scrapy.cmdline import execute
from scrapy.utils.markup import remove_tags


class ExampleSpider(CrawlSpider):
    name = "goal3"
    allowed_domains = ["whoscored.com"]
    start_urls = ["http://www.whoscored.com/"]
    rules = [Rule(SgmlLinkExtractor(allow=()), 
                  follow=True),
             Rule(SgmlLinkExtractor(allow=()), callback='parse_item')
    ]
    def parse_item(self,response):
        self.log('A response from %s just arrived!' % response.url)
        scripts = response.selector.xpath("normalize-space(//title)")
        for scripts in scripts:
            body = response.xpath('//p').extract()
            body2 = "".join(body)
            print remove_tags(body2).encode('utf-8')  


execute(['scrapy','crawl','goal3'])

代码执行没有任何错误，但是刮掉了4623页，217得到了200的HTTP响应代码，2得到了302的代码，4404得到了403响应。任何人都可以在代码中看到任何明显的原因，为什么会出现这种情况？这可能是来自网站的反刮痧措施吗？通常的做法是减慢为阻止这种情况而提交的提交数量吗？

The code is executing without any errors, however of the 4623 pages scraped, 217 got a HTTP response code of 200, 2 got a code of 302 and 4404 got a 403 response. Can anyone see anything immediately obvious in the code as to why this might be? Could this be an anti Scraping measure from the site? Is it usual practice to slow the number of submissions made to stop this happening?

谢谢

HTTP 403使用Python Scrapy时的响应 [英] HTTP 403 Responses when using Python Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

HTTP 403使用Python Scrapy时的响应 [英] HTTP 403 Responses when using Python Scrapy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭