爬虫蜘蛛没有返回任何结果 [英] scrapy spider not returning any results
问题描述
这是我第一次尝试创建蜘蛛,如果我做得不好,请原谅我.这是我试图从中提取数据的网站的链接.http://www.4icu.org/in/.我想要显示在页面上的整个大学列表.但是当我运行以下蜘蛛时,我返回一个空的 json 文件.我的物品.py
This is my first attempt to create a spider, kindly spare me if I have not done it properly. Here is the link to the website I am trying to extract data from. http://www.4icu.org/in/. I want the entire list of colleges that is being displayed on the page. But when I run the following spider I am returned with an empty json file. my items.py
import scrapy
class CollegesItem(scrapy.Item):
# define the fields for your item here like:
link = scrapy.Field()
这是蜘蛛大学.py
import scrapy
from scrapy.spider import Spider
from scrapy.http import Request
class CollegesItem(scrapy.Item):
# define the fields for your item here like:
link = scrapy.Field()
class CollegesSpider(Spider):
name = 'colleges'
allowed_domains = ["4icu.org"]
start_urls = ('http://www.4icu.org/in/',)
def parse(self, response):
return Request(
url = "http://www.4icu.org/in/",
callback = self.parse_fixtures
)
def parse_fixtures(self,response):
sel = response.selector
for div in sel.css("col span_2_of_2>div>tbody>tr"):
item = Fixture()
item['university.name'] = tr.xpath('td[@class="i"]/span /a/text()').extract()
yield item
推荐答案
如问题的评论中所述,您的代码存在一些问题.
As stated in the comment for the question there are some issues with your code.
首先,您不需要两个方法——因为在 parse
方法中,您调用的 URL 与您在 start_urls
中所做的相同.
First of all, you do not need two methods -- because in the parse
method you call the same URL as you did in start_urls
.
要从站点获取一些信息,请尝试使用以下代码:
To get some information from the site try using the following code:
def parse(self, response):
for tr in response.xpath('//div[@class="section group"][5]/div[@class="col span_2_of_2"][1]/table//tr'):
if tr.xpath(".//td[@class='i']"):
name = tr.xpath('./td[1]/a/text()').extract()[0]
location = tr.xpath('./td[2]//text()').extract()[0]
print name, location
并根据您的需要进行调整以填充您的项目(或多个项目).
and adjust it to your needs to fill your item (or items).
如您所见,您的浏览器在 table
中显示了一个额外的 tbody
,当您使用 Scrapy 抓取时,该tbody
不存在.这意味着您经常需要判断您在浏览器中看到的内容.
As you can see, your browser displays an additional tbody
in the table
which is not present when you scrape with Scrapy. This means you often need to judge what you see in the browser.
这篇关于爬虫蜘蛛没有返回任何结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!