xpath在scrapy中返回的空列表 [英] Empty list returning by xpath in scrapy

查看:140
本文介绍了xpath在scrapy中返回的空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究scrapy,我正在尝试从某个站点收集一些数据,

I am working on scrapy , i am trying to gather some data from a site ,

蜘蛛代码

class NaaptolSpider(BaseSpider):
    name = "naaptol"
    domain_name = "www.naaptol.com"
    start_urls = ["http://www.naaptol.com/buy/mobile_phones/mobile_handsets.html"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        cell_matter = hxs.select('//div[@class="gridInfo"]/div[@class="gridProduct gridProduct_special"]')
        items=[]
        for i in cell_matter:
               cell_names = i.select('//p[@class="proName"]/a/text()').extract()
               prices = i.select('//p[@class="values"]/strong/text()').extract()
               item = ExampleItem()
               item['cell_name'] = cell_names
               item['price'] = prices
               items.append(item) 
        return [FormRequest(url="http://www.naaptol.com/faces/jsp/search/searchResults.jsp",
            formdata={'type': 'cat_catlg',
            'catid': '27',
            'sb' : '9,8',
            'frm' : '1',
            'max' : '15',
            'req': 'ajax'
            },
            callback=self.parse_item
            )]

def parse_item(self, response):
     hxs = HtmlXPathSelector(response) 
     cell_matter = hxs.select('//div[@class="gridInfo"]/div[@class="gridProduct gridProduct_special"]')
     for i in cell_matter:
               cell_names = i.select('//p[@class="proName"]/a/text()').extract()
               prices = i.select('//p[@class="values"]/strong/text()').extract()
               print cell_names
               print prices 

结果:

2012-06-15 09:38:36+0530 [naaptol] DEBUG: Crawled (200) <POST http://www.naaptol.com/faces/jsp/search/searchResults.jsp> (referer: http://www.naaptol.com/buy/mobile_phones/mobile_handsets.html)
[]
[]

实际上我已经发布了表格以实现JavaScript中的分页

Actually i had posted the form to achieve the pagination which is in javascript

这里我在parse_item方法中接收到来自parse方法的响应,但是当我使用与parse方法中相同的xpath时,它如上所述返回一个空列表,谁能告诉我为什么它返回一个空数组,以及为什么出错我的代码.

Here i am receiving the response from parse method in parse_item method, but when i used the xpath same as in parse method its returning an empty list as above, can anyone tell me why its returning an empty array, and whats wrong in my code.

预先感谢

推荐答案

响应为JSON格式:

{
  "prodList": [
    {
      "pid": "955492",
      "pnm": "Samsung Star 3 Duos",
      "mctid": "27",
      "pc": "5,650",
      "mrp": "6290",
      "pdc": "10",
      "pimg": "Samsung-Star-3-duos-1.jpg",
      "rt": "8",
      "prc": "1",
      "per": "Y",
      (...)
    },
    (...)
}

为了解析它,您可以使用python的json模块.您要实现的目标的示例如下:

In order to parse it, you can use python's json module. An example of what you are trying to achieve is here: Empty list for hrefs to achieve pagination through JavaScript onclick functions.

这篇关于xpath在scrapy中返回的空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆