用于通过JavaScript onclick函数实现分页的href的空列表 [英] Empty list for hrefs to achieve pagination through JavaScript onclick functions
问题描述
我的意图是achieve the pagination from javascript functions
,例如,我从该URL取URL为http://events.justdial.com/events/index.php?city=Hyderabad
,因为您可以在页面末尾看到分页,因此,如果您观察到HTML是写的,通过具有#
标签作为#
的JavaScript函数,我只是试图收集href标签,即使它们是#
.
以下是我的代码
My intension is to achieve the pagination from javascript functions
, so for example I am taking the URL as http://events.justdial.com/events/index.php?city=Hyderabad
, from this URL as you can see the pagination at the end of the page, so if you observe HTML of that they are written through JavaScript functions which has href
tags as #
, I am just trying to collect that href tags even though they are #
.
The following is my code
class justdialdotcomSpider(BaseSpider):
name = "justdialdotcom"
allowed_domains = ["www.justdial.com"]
start_urls = ["http://events.justdial.com/events/index.php?city=Hyderabad"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
pagination = hxs.select('//div[@id="main"]/div[@id="content"]/div[@id="pagination"]/a').extract()
print pagination,">>>>>>>>>>>>>>>>>."
运行上面的代码时,我得到的结果为[]
,我的意思是没有,有人可以告诉我如何通过JavaScript onclick函数实现分页以及结果为何为空. HTML中的错误,例如分页中的一个页面的锚标记为<a onclick="jdevents.setPageNo(2)" href="#">2</a>
但是当我尝试通过浏览器单击view page source
来查看此内容时,我看不到任何功能为jdevents.setPageNo(2)
,(我希望如果我们能看到他在HTML中所做的工作,我们可以根据请求通过formdata将其发布)真的很困惑,无法解决这个问题.
When I run the above code I am getting the result as []
, I mean none,can anyone tell me how to achieve the pagination through that JavaScript onclick functions and why the result is empty.And I am observing some kind of wierd in HTML that for example one of the page in pagination has anchor tag as <a onclick="jdevents.setPageNo(2)" href="#">2</a>
but when I tried to view this by clicking view page source
through browser I can't see any function as jdevents.setPageNo(2)
, (I expect if we can see what he is doing in HTML we can post that through formdata as request) I am really confused and unable to go through this.
推荐答案
如果您跟踪请求,则会在以下URL上找到发帖请求: http://events.justdial.com/events/search.php
If you tracked the requests, you'll find post requests to the following URL : http://events.justdial.com/events/search.php
发布数据:
city:Hyderabad
cat:0
area:0
fromDate:
toDate:
subCat:0
pageNo:2
fetch:events
,并且响应为JSON格式.
and the response is in JSON format.
因此,您的代码应为以下
So, your code should be the following
import re
import json
class justdialdotcomSpider(BaseSpider):
name = "justdialdotcom"
domain_name = "www.justdial.com"
start_urls = ["http://events.justdial.com/events/search.php"]
# Initial request
def parse(self, response):
return [FormRequest(url="http://events.justdial.com/events/search.php",
formdata={'fetch': 'area',
'pageNo': '1',
'city' : 'Hyderabad',
'cat' : '0',
'area' : '0',
'fromDate': '',
'toDate' : '',
'subCat' : '0'
},
callback=self.area_count
)]
# Get total count and paginate through events
def area_count(self, response):
total_count = 0
for area in json.loads(response.body):
total_count += int(area["count"])
pages_count = (total_count / 10) + 1
page = 1
while (page <= pages_count):
yield FormRequest(url="http://events.justdial.com/events/search.php",
formdata={'fetch': 'events',
'pageNo': str(page),
'city' : 'Hyderabad',
'cat' : '0',
'area' : '0',
'fromDate': '',
'toDate' : '',
'subCat' : '0'
},
callback=self.parse_events
)
page += 1
# parse events
def parse_events(self, response):
events = json.loads(response.body)
events.pop(0)
for event_details in events:
yield FormRequest(url="http://events.justdial.com/events/search.php",
formdata={'fetch': 'event',
'eventId': str(event_details["id"]),
},
callback=self.parse_event
)
def parse_event(self, response):
event_details = json.loads(response.body)
items = []
#item = Product()
items.append(item)
return items
这篇关于用于通过JavaScript onclick函数实现分页的href的空列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!