使用AJAX的抓取网站 [英] Scraping site that uses AJAX

查看：56 发布时间：2021/4/2 19:59:09 python ajax scrapy

本文介绍了使用AJAX的抓取网站的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在这里阅读了一些相关的帖子，但找不到答案.

I've read some relevant posts here but couldn't figure an answer.

我正在尝试检索包含评论的网页.访问网站时，最初只有10条评论，并且用户每次滚动时都应按显示更多"来获得10条评论(这还会在网站的地址末尾添加#add10 )直到评论列表的末尾.实际上，用户可以通过在网站地址的末尾添加#add1000 (其中1000个是其他评论数)来获得完整的评论列表.问题是，我只能在蜘蛛中使用 site_url#add1000 来获得前10条评论，就像 site_url 一样，因此这种方法行不通.

I'm trying to crawl a web page with reviews. When site is visited there are only 10 reviews at first and a user should press "Show more" to get 10 more reviews (that also adds #add10 to the end of site's address) every time when he scrolls down to the end of reviews list. Actually, a user can get full review list by adding #add1000 (where 1000 is a number of additional reviews) to the end of the site's address. The problem is that I get only first 10 reviews using site_url#add1000 in my spider just like with site_url so this approach doesn't work.

我也找不到一种方法来发出适当的请求，以模仿站点中的原始请求.原始AJAX网址的格式为" domain/ajaxlst?par1 = x& par2 = y "，我尝试了所有这些方法:

I also can't find a way to make an appropriate Request imitating the origin one from the site. Origin AJAX url is of the form 'domain/ajaxlst?par1=x&par2=y' and I tried all of this:

Request(url='domain/ajaxlst?par1=x&par2=y', callback=self.parse_all) 
Request(url='domain/ajaxlst?par1=x&par2=y', callback=self.parse_all,
        headers={all_headers})
Request(url='domain/ajaxlst?par1=x&par2=y', callback=self.parse_all,
        headers={all_headers}, cookies={all_cookies})

但是每次遇到404错误时，都会出现.谁能解释我在做什么错?

But every time I'm getting a 404 Error. Can anyone explain what I'm doing wrong?

使用AJAX的抓取网站 [英] Scraping site that uses AJAX

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

使用AJAX的抓取网站 [英] Scraping site that uses AJAX

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭