如何在 Scrapy 中迭代 div? [英] How to iterate over divs in Scrapy?

查看:60
本文介绍了如何在 Scrapy 中迭代 div?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是一个非常微不足道的问题,但我是 Scrapy 的新手.我试图为我的问题找到解决方案,但我看不出这段代码有什么问题.

It is propably very trivial question but I am new to Scrapy. I've tried to find solution for my problem but I just can't see what is wrong with this code.

我的目标是从给定网站上删除所有歌剧节目.每个节目的数据都在一个具有row-fluid row-performance"类的 div 中.我试图遍历它们以检索它,但它不起作用.它为我提供了每次迭代中第一个 div 的内容(我获得了 19 倍相同的节目,而不是不同的项目).

My goal is to scrap all of the opera shows from given website. Data for every show is inside one div with class "row-fluid row-performance ". I am trying to iterate over them to retrieve it but it doesn't work. It gives me content of the first div in each iteration(I am getting 19x times the same show, instead of different items).

感谢您的建议!

import scrapy
from ..items import ShowItem

class OperaSpider(scrapy.Spider):
    name = "opera"
    allowed_domains = ["http://www.opera.krakow.pl"]
    start_urls = [
        "http://www.opera.krakow.pl/pl/repertuar/na-afiszu/listopad"

    ]


    def parse(self, response):
        divs = response.xpath('//div[@class="row-fluid row-performance    "]')
        for div in divs:
            item= ShowItem()
            item['title'] = div.xpath('//h2[@class="item-title"]/a/text()').extract()
            item['time'] = div.xpath('//div[@class="item-time vertical-center"]/div[@class="vcentered"]/text()').extract()
            item['date'] = div.xpath('//div[@class="item-date vertical-center"]/div[@class="vcentered"]/text()').extract()
            yield item

推荐答案

尝试将 for 循环内的 xpaths 更改为以 .// 开头.也就是说,只需在双反斜杠前面加一个点即可.您也可以尝试使用 extract_first() 而不是 extract() ,看看是否能给您带来更好的结果.

Try to change the xpaths inside the for loop to start with .//. That is, just put a dot in front of the double backslash. You can also try using extract_first() instead of extract() and see if that gives you better results.

这篇关于如何在 Scrapy 中迭代 div?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆