在 Python scrapy 项目中传递全局变量的问题 [英] Problems with passing global variables in a Python scrapy project
问题描述
在我正在做的一个 Scrapy 项目中,我在将一个包含列表的变量从一个函数发送到另一个函数时遇到了困难.我需要这样做,因为我需要在脚本末尾将一个页面中的值与另一个页面中的值组合在一起.代码如下:
In a Scrapy project I am doing, I am having difficulties in sending a variable containing a list from one function to another. I need to do so, as I need to combine the values from one page along with another at the end of the script. The code is as follows:
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.http.request import Request
from dirbot.items import Website
from scrapy.contrib.spiders import CrawlSpider,Rule
from six import string_types
from datetime import datetime
from decimal import Decimal
import itertools
import numpy
import urlparse
import scrapy
class DmozSpider(Spider):
name = "dnot"
allowed_domains = ["ca.finance.yahoo.com", "http://eoddata.com/"]
start_urls = [
"http://eoddata.com/stocklist/TSX.htm"
]
def parse(self,response):
companyList = response.xpath('//tr[@class="ro"]/td/a/text()').extract()
for company in companyList:
go = 'https://ca.finance.yahoo.com/q/hp?s={0}.TO&a=02&b=2&c=2005&d=02&e=2&f=2015&g=m'.format(company)
for link in go:
yield Request(go, self.stocks1)
def stocks1(self, response):
# global returns_page1
# EAFP = Easier to ask for forgiveness then permission
# Gathers ONLY adjusted closing stock price
global returns_page1
returns_page1 = []
rows = response.xpath('//table[@class="yfnc_datamodoutline1"]//table/tr')[1:]
for row in rows:
cells = row.xpath('.//td/text()').extract()
try:
datetime.strptime(cells[0], "%b %d, %Y")
values = cells[-1]
returns_page1.append(values)
except ValueError:
continue
current_page = response.url
next_page = current_page + "&z=66&y=66"
yield Request(next_page, self.stocks2)
def stocks2(self, response):
item = Website()
global returns_page1
returns_page2 = []
rows = response.xpath('//table[@class="yfnc_datamodoutline1"]//table/tr')[1:]
for row in rows:
cells = row.xpath('.//td/text()').extract()
try:
datetime.strptime(cells[0], "%b %d, %Y")
values = cells[-1]
returns_page2.append(values)
except ValueError:
continue
returns_tot = returns_page1 + returns_page2
returns_dec = [Decimal(float(i)) for i in returns_tot]
returns = [float(n) for n in returns_dec]
items = []
item = Website()
item['url'] = response.url
item['name'] = response.xpath('//div[@class="title"]/h2/text()').extract()
item['avgreturns'] = numpy.mean(returns)
item['varreturns'] = numpy.var(returns)
item['sdreturns'] = numpy.std(returns)
item['returns'] = returns
items.append(item)
yield item
我正在尝试将 def stock1
函数中的 returns_page1
与 def stock2
中收集的 returns_page2
结合起来> 功能.然而,我的输出只给了我来自 returns_page2
变量的值.
I am trying to combine returns_page1
from the def stocks1
function with returns_page2
that is gathered in the def stocks2
function. However my output is only giving me the values from the returns_page2
variable.
我知道我不能在 def stock1
函数中输入回报,因为我在那里有一个收益.这就是我尝试使用全局变量的原因.
I know I can't put a return in the def stocks1
function because I have a yield in there. That's why I tried using global variables.
我在这里做错了什么?
推荐答案
将值从一个函数传递到另一个函数的最佳方式是在请求中使用元数据,
Best way of passing values from one function to another is using meta in request,
在第一个函数中
yield Request(next_page, self.stocks2, meta={'returns_page1': returns_page1})
在第二个函数中
returns_page1 = response.meta.get('returns_page1')
这篇关于在 Python scrapy 项目中传递全局变量的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!