scrapy:请求 url 必须是 str 或 unicode got list [英] scrapy: request url must be str or unicode got list

查看:83
本文介绍了scrapy:请求 url 必须是 str 或 unicode got list的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法弄清楚这段代码有什么问题.我想抓取第一页,然后,对于该页面上的每个链接,转到第二页以提取项目描述.当我运行下面的代码时,我得到:exception.TypeError: url must be str or unicode, got list.这是我的代码:

I cant quite figure out what's wrong with this code. I would like to scrape the first page, and then, for each link on that page, go to the second page to extract the item description. When i run the code below, i get: exception.TypeError: url must be str or unicode, got list. here is my code:

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.item import Item, Field
from scrapy.contrib.loader import ItemLoader
from scrapy.contrib.loader.processor import MapCompose,  Join
from scrapy.contrib.loader import XPathItemLoader
from my.items import myItem

class mySpider(Spider):
    name = "my"
    allowed_domains = ["my.com"]
    start_urls = ['http://sjg.my.com/cf_jy.cfm']

    def parse(self, response):
        s = Selector(response)
        rows = s.xpath('//table[@class="table-order"]//tr')
        for row in rows:
            l = XPathItemLoader(item=myItem(), selector=row)
            l.default_input_processor = MapCompose(unicode.strip)
            l.default_output_processor = Join()
            l.add_xpath('title', './/a[contains(@href,"cf_jy.cfm?hu_pg")]/text()')
            l.add_xpath('url1', './/a/@href')
            l.add_xpath('dates', './/td[4]/text()')
            l.add_xpath('rev', './/td[@align="right"]/text()')
            l.add_xpath('typ', './/td[3]/text()')
            l.add_value('name', u'gsf')
            request = Request(l.get_xpath('.//a/@href'), callback=self.parse_link,meta={'l':l})
            yield request      

    def parse_link(self, response):
        l = response.meta["l"]
        s = Selector(response)
        q = s.xpath("//div[@class='content-main']/td[@class='text']/p/text()").extract()
        l.add_value('description',q)
        yield l.load_item()

提前致谢.

推荐答案

根据 Scrapy Request 的第一个参数接受字符串.但是在您的代码中 l.get_xpath('.//a/@href') 正在返回一个列表.所以尝试只发送字符串到请求的 url.

According to Scrapy Request's first argument takes string. But in your code l.get_xpath('.//a/@href') is returning a list. So try to send only string to Request's url.

例如:

Request("Some_link_goes_here", callback=self.parse_link,meta={'l':l})

这篇关于scrapy:请求 url 必须是 str 或 unicode got list的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆