用于通过JavaScript onclick函数实现分页的href的空列表 [英] Empty list for hrefs to achieve pagination through JavaScript onclick functions

查看:146
本文介绍了用于通过JavaScript onclick函数实现分页的href的空列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的意图是achieve the pagination from javascript functions,例如,我从该URL取URL为http://events.justdial.com/events/index.php?city=Hyderabad,因为您可以在页面末尾看到分页,因此,如果您观察到HTML是写的,通过具有#标签作为#的JavaScript函数,我只是试图收集href标签,即使它们是#. 以下是我的代码

My intension is to achieve the pagination from javascript functions, so for example I am taking the URL as http://events.justdial.com/events/index.php?city=Hyderabad, from this URL as you can see the pagination at the end of the page, so if you observe HTML of that they are written through JavaScript functions which has href tags as #, I am just trying to collect that href tags even though they are #. The following is my code

class justdialdotcomSpider(BaseSpider):
   name = "justdialdotcom"
   allowed_domains = ["www.justdial.com"]
   start_urls = ["http://events.justdial.com/events/index.php?city=Hyderabad"]

   def parse(self, response):
       hxs = HtmlXPathSelector(response)
       pagination = hxs.select('//div[@id="main"]/div[@id="content"]/div[@id="pagination"]/a').extract()
       print pagination,">>>>>>>>>>>>>>>>>."

运行上面的代码时,我得到的结果为[],我的意思是没有,有人可以告诉我如何通过JavaScript onclick函数实现分页以及结果为何为空. HTML中的错误,例如分页中的一个页面的锚标记为<a onclick="jdevents.setPageNo(2)" href="#">2</a> 但是当我尝试通过浏览器单击view page source来查看此内容时,我看不到任何功能为jdevents.setPageNo(2),(我希望如果我们能看到他在HTML中所做的工作,我们可以根据请求通过formdata将其发布)真的很困惑,无法解决这个问题.

When I run the above code I am getting the result as [], I mean none,can anyone tell me how to achieve the pagination through that JavaScript onclick functions and why the result is empty.And I am observing some kind of wierd in HTML that for example one of the page in pagination has anchor tag as <a onclick="jdevents.setPageNo(2)" href="#">2</a> but when I tried to view this by clicking view page sourcethrough browser I can't see any function as jdevents.setPageNo(2), (I expect if we can see what he is doing in HTML we can post that through formdata as request) I am really confused and unable to go through this.

推荐答案

如果您跟踪请求,则会在以下URL上找到发帖请求: http://events.justdial.com/events/search.php

If you tracked the requests, you'll find post requests to the following URL : http://events.justdial.com/events/search.php

发布数据:

city:Hyderabad 
cat:0 
area:0 
fromDate: 
toDate: 
subCat:0 
pageNo:2
fetch:events

,并且响应为JSON格式.

and the response is in JSON format.

因此,您的代码应为以下

So, your code should be the following

import re
import json

class justdialdotcomSpider(BaseSpider):
    name = "justdialdotcom"
    domain_name = "www.justdial.com"
    start_urls = ["http://events.justdial.com/events/search.php"]


    # Initial request
    def parse(self, response):
        return [FormRequest(url="http://events.justdial.com/events/search.php",
                                        formdata={'fetch': 'area',
                                                  'pageNo': '1',
                                                  'city' : 'Hyderabad',
                                                  'cat' : '0',
                                                  'area' : '0',
                                                  'fromDate': '',
                                                  'toDate' : '',
                                                  'subCat' : '0'
                                                  },
                                        callback=self.area_count
                                        )]


# Get total count and paginate through events
    def area_count(self, response):
        total_count = 0
        for area in  json.loads(response.body):
            total_count += int(area["count"])

        pages_count = (total_count / 10) + 1

        page = 1
        while (page <= pages_count):
            yield FormRequest(url="http://events.justdial.com/events/search.php",
                                        formdata={'fetch': 'events',
                                                  'pageNo': str(page),
                                                  'city' : 'Hyderabad',
                                                  'cat' : '0',
                                                  'area' : '0',
                                                  'fromDate': '',
                                                  'toDate' : '',
                                                  'subCat' : '0'
                                                  },
                                        callback=self.parse_events
                                        )
            page += 1


# parse events 
    def parse_events(self, response):
        events = json.loads(response.body)
        events.pop(0)

        for event_details in events:
            yield FormRequest(url="http://events.justdial.com/events/search.php",
                                        formdata={'fetch': 'event',
                                                  'eventId': str(event_details["id"]),
                                                  },
                                        callback=self.parse_event
                                        )



    def parse_event(self, response):
        event_details = json.loads(response.body)
        items = []
        #item = Product()

        items.append(item)
        return items

这篇关于用于通过JavaScript onclick函数实现分页的href的空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆