使用scrapy产生多个项目 [英] Yield multiple items using scrapy

查看:50
本文介绍了使用scrapy产生多个项目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从以下网址抓取数据:
http://www.indexmundi.com/commodities/?commodity=gasoline >

有两个部分包含价格:墨西哥湾沿岸汽油期货日终结算价汽油每日价格

我想从两个部分中抓取数据作为两个不同的项目.这是我写的代码:

如果每天价格:item['description'] = u''.join(dailyPrice.xpath(".//h1/text()").extract())item['price'] = u''.join(dailyPrice.xpath(".//span/text()").extract())item['unit'] = dailyPrice.xpath(".//div/p/text()").extract()[0].split(',')[-1]regex = re.compile("Source:(.*)",re.IGNORECASE|re.UNICODE)结果 = re.search(regex, u''.join(dailyPrice.xpath(".//div/p/text()").extract()))如果结果:item['source'] = result.group(1).strip()产量项目如果未来价格:item['description'] = u''.join(futurePrice.xpath(".//h1/text()").extract())item['price'] = u''.join(futurePrice.xpath(".//span/text()").extract())item['unit'] = u''.join(futurePrice.xpath(".//div[2]/table//tr[1]/td/text()").extract())source = futurePrice.xpath(".//div[2]/table//tr[4]/td/a/text()").extract()如果来源:item['source'] = u' - '.join(source)别的:项目['来源'] = ''产量项目

我想知道这段代码是否可以正常工作,或者应该采取什么正确的方法来做到这一点?

解决方案

它应该可以正常工作.您可以根据需要从 parse 回调中生成任意数量的项目.只是一些注意事项:

  1. 在第二种情况下,最好创建一个新项目然后重用旧项目.因为你永远不知道旧的项目引用发生了什么.也许您正在覆盖并丢失以前的数据.

  2. 您可以为两个案例创建不同的项目类型.并在管道中区别对待它们.

I'm scraping data from the following URL:
http://www.indexmundi.com/commodities/?commodity=gasoline

There are two sections which contain price: Gulf Coast Gasoline Futures End of Day Settlement Price and Gasoline Daily Price

I want to scrape data from both sections as two different items. Here is the code which I've written:

if dailyPrice:
        item['description'] = u''.join(dailyPrice.xpath(".//h1/text()").extract())
        item['price'] = u''.join(dailyPrice.xpath(".//span/text()").extract())
        item['unit'] =  dailyPrice.xpath(".//div/p/text()").extract()[0].split(',')[-1]
        regex = re.compile("Source:(.*)",re.IGNORECASE|re.UNICODE)
        result = re.search(regex, u''.join(dailyPrice.xpath(".//div/p/text()").extract()))
        if result:
            item['source'] = result.group(1).strip()

        yield item


if futurePrice:
        item['description'] = u''.join(futurePrice.xpath(".//h1/text()").extract())
        item['price'] = u''.join(futurePrice.xpath(".//span/text()").extract())
        item['unit'] =  u''.join(futurePrice.xpath(".//div[2]/table//tr[1]/td/text()").extract())
        source = futurePrice.xpath(".//div[2]/table//tr[4]/td/a/text()").extract()
        if source:
            item['source'] = u' - '.join(source)
        else:
            item['source'] = ''

        yield item

I want to know if this code will work fine or what should be correct way to do this?

解决方案

It should work just fine. You can yield as many items from a parse callback as you need. Just some notes:

  1. In the second case it's better to create a new item then reusing the old one. Because you never know what has happened to the old item reference. Maybe you are overwriting and losing the previous data.

  2. You can create different item types for your two cases. And in the pipeline treat them differently.

这篇关于使用scrapy产生多个项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆