爬虫蜘蛛没有返回任何结果 [英] scrapy spider not returning any results

查看:59
本文介绍了爬虫蜘蛛没有返回任何结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次尝试创建蜘蛛,如果我做得不好,请原谅我.这是我试图从中提取数据的网站的链接.http://www.4icu.org/in/.我想要显示在页面上的整个大学列表.但是当我运行以下蜘蛛时,我返回一个空的 json 文件.我的物品.py

This is my first attempt to create a spider, kindly spare me if I have not done it properly. Here is the link to the website I am trying to extract data from. http://www.4icu.org/in/. I want the entire list of colleges that is being displayed on the page. But when I run the following spider I am returned with an empty json file. my items.py

    import scrapy
    class CollegesItem(scrapy.Item):
    # define the fields for your item here like:
        link = scrapy.Field() 

这是蜘蛛大学.py

    import scrapy
    from scrapy.spider import Spider
    from scrapy.http import Request

    class CollegesItem(scrapy.Item):
    # define the fields for your item here like:
        link = scrapy.Field()

    class CollegesSpider(Spider):
        name = 'colleges'
        allowed_domains = ["4icu.org"]
        start_urls = ('http://www.4icu.org/in/',)

        def parse(self, response):
            return Request(
                url = "http://www.4icu.org/in/",
                callback = self.parse_fixtures
            )
        def parse_fixtures(self,response):
            sel = response.selector
            for div in sel.css("col span_2_of_2>div>tbody>tr"):
                item = Fixture()
                item['university.name'] = tr.xpath('td[@class="i"]/span  /a/text()').extract()
                yield item

推荐答案

如问题的评论中所述,您的代码存在一些问题.

As stated in the comment for the question there are some issues with your code.

首先,您不需要两个方法——因为在 parse 方法中,您调用的 URL 与您在 start_urls 中所做的相同.

First of all, you do not need two methods -- because in the parse method you call the same URL as you did in start_urls.

要从站点获取一些信息,请尝试使用以下代码:

To get some information from the site try using the following code:

def parse(self, response):
    for tr in response.xpath('//div[@class="section group"][5]/div[@class="col span_2_of_2"][1]/table//tr'):
        if tr.xpath(".//td[@class='i']"):
            name = tr.xpath('./td[1]/a/text()').extract()[0]
            location = tr.xpath('./td[2]//text()').extract()[0]
            print name, location

并根据您的需要进行调整以填充您的项目(或多个项目).

and adjust it to your needs to fill your item (or items).

如您所见,您的浏览器在 table 中显示了一个额外的 tbody,当您使用 Scrapy 抓取时,该tbody 不存在.这意味着您经常需要判断您在浏览器中看到的内容.

As you can see, your browser displays an additional tbody in the table which is not present when you scrape with Scrapy. This means you often need to judge what you see in the browser.

这篇关于爬虫蜘蛛没有返回任何结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆