按顺序抓取抓取网址 [英] Scrapy Crawl URLs in Order

查看：30 发布时间：2021/12/10 10:15:34 python sorting asynchronous hashmap scrapy

本文介绍了按顺序抓取抓取网址的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以，我的问题比较简单.我有一只蜘蛛正在爬取多个站点，我需要它按照我在代码中编写的顺序返回数据.贴在下面.

So, my problem is relatively simple. I have one spider crawling multiple sites, and I need it to return the data in the order I write it in my code. It's posted below.

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from mlbodds.items import MlboddsItem

class MLBoddsSpider(BaseSpider):
   name = "sbrforum.com"
   allowed_domains = ["sbrforum.com"]
   start_urls = [
       "http://www.sbrforum.com/mlb-baseball/odds-scores/20110328/",
       "http://www.sbrforum.com/mlb-baseball/odds-scores/20110329/",
       "http://www.sbrforum.com/mlb-baseball/odds-scores/20110330/"
   ]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       sites = hxs.select('//div[@id="col_3"]//div[@id="module3_1"]//div[@id="moduleData4952"]')
       items = []
       for site in sites:
           item = MlboddsItem()
           item['header'] = site.select('//div[@class="scoreboard-bar"]//h2//span[position()>1]//text()').extract()# | /*//table[position()<2]//tr//th[@colspan="2"]//text()').extract()
           item['game1'] = site.select('/*//table[position()=1]//tr//td[@class="tbl-odds-c2"]//text() | /*//table[position()=1]//tr//td[@class="tbl-odds-c4"]//text() | /*//table[position()=1]//tr//td[@class="tbl-odds-c6"]//text()').extract()
           items.append(item)
       return items

结果以随机顺序返回，例如返回第 29 个，然后是第 28 个，然后是第 30 个.我已经尝试将调度程序顺序从 DFO 更改为 BFO，以防万一这是问题所在，但这并没有改变任何东西.

The results are returned in a random order, for example it returns the 29th, then the 28th, then the 30th. I've tried changing the scheduler order from DFO to BFO, just in case that was the problem, but that didn't change anything.

按顺序抓取抓取网址 [英] Scrapy Crawl URLs in Order

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

按顺序抓取抓取网址 [英] Scrapy Crawl URLs in Order

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭