使用scrapy产生多个项目 [英] Yield multiple items using scrapy
问题描述
我正在从以下网址抓取数据:
http://www.indexmundi.com/commodities/?commodity=gasoline >
有两个部分包含价格:墨西哥湾沿岸汽油期货日终结算价和汽油每日价格
我想从两个部分中抓取数据作为两个不同的项目.这是我写的代码:
如果每天价格:item['description'] = u''.join(dailyPrice.xpath(".//h1/text()").extract())item['price'] = u''.join(dailyPrice.xpath(".//span/text()").extract())item['unit'] = dailyPrice.xpath(".//div/p/text()").extract()[0].split(',')[-1]regex = re.compile("Source:(.*)",re.IGNORECASE|re.UNICODE)结果 = re.search(regex, u''.join(dailyPrice.xpath(".//div/p/text()").extract()))如果结果:item['source'] = result.group(1).strip()产量项目如果未来价格:item['description'] = u''.join(futurePrice.xpath(".//h1/text()").extract())item['price'] = u''.join(futurePrice.xpath(".//span/text()").extract())item['unit'] = u''.join(futurePrice.xpath(".//div[2]/table//tr[1]/td/text()").extract())source = futurePrice.xpath(".//div[2]/table//tr[4]/td/a/text()").extract()如果来源:item['source'] = u' - '.join(source)别的:项目['来源'] = ''产量项目
我想知道这段代码是否可以正常工作,或者应该采取什么正确的方法来做到这一点?
它应该可以正常工作.您可以根据需要从 parse
回调中生成任意数量的项目.只是一些注意事项:
在第二种情况下,最好创建一个新项目然后重用旧项目.因为你永远不知道旧的项目引用发生了什么.也许您正在覆盖并丢失以前的数据.
您可以为两个案例创建不同的项目类型.并在管道中区别对待它们.
I'm scraping data from the following URL:
http://www.indexmundi.com/commodities/?commodity=gasoline
There are two sections which contain price: Gulf Coast Gasoline Futures End of Day Settlement Price and Gasoline Daily Price
I want to scrape data from both sections as two different items. Here is the code which I've written:
if dailyPrice:
item['description'] = u''.join(dailyPrice.xpath(".//h1/text()").extract())
item['price'] = u''.join(dailyPrice.xpath(".//span/text()").extract())
item['unit'] = dailyPrice.xpath(".//div/p/text()").extract()[0].split(',')[-1]
regex = re.compile("Source:(.*)",re.IGNORECASE|re.UNICODE)
result = re.search(regex, u''.join(dailyPrice.xpath(".//div/p/text()").extract()))
if result:
item['source'] = result.group(1).strip()
yield item
if futurePrice:
item['description'] = u''.join(futurePrice.xpath(".//h1/text()").extract())
item['price'] = u''.join(futurePrice.xpath(".//span/text()").extract())
item['unit'] = u''.join(futurePrice.xpath(".//div[2]/table//tr[1]/td/text()").extract())
source = futurePrice.xpath(".//div[2]/table//tr[4]/td/a/text()").extract()
if source:
item['source'] = u' - '.join(source)
else:
item['source'] = ''
yield item
I want to know if this code will work fine or what should be correct way to do this?
It should work just fine. You can yield as many items from a parse
callback as you need. Just some notes:
In the second case it's better to create a new item then reusing the old one. Because you never know what has happened to the old item reference. Maybe you are overwriting and losing the previous data.
You can create different item types for your two cases. And in the pipeline treat them differently.
这篇关于使用scrapy产生多个项目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!