刮使用python AJAX页面 [英] Scraping ajax pages using python

查看:146
本文介绍了刮使用python AJAX页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到这个问题大约刮阿贾克斯,但蟒蛇没有被提及存在。我认为使用 scrapy 的,我相信他们对这个问题的一些文件,但你可以看到该网站已关闭。所以我不知道该怎么办。我要做到以下几点:

I've already seen this question about scraping ajax, but python isn't mentioned there. I considered using scrapy, i believe they have some docs on that subject, but as you can see the website is down. So i don't know what to do. I want to do the following:

我只有一个网址,example.com你从网页点击提交,该URL,因为他们正在使用AJAX来显示内容不会改变进入页面。我想刮每一页的内容,该怎么办呢?

I only have one url, example.com you go from page to page by clicking submit, the url doesn't change since they're using ajax to display the content. I want to scrape the content of each page, how to do it?

让我们说,我想刮只有数字,还有什么比scrapy等,将做到这一点?如果没有,你给我如何做一个片段,只是因为他们的网站已关闭,所以我不能达到的文档。

Lets say that i want to scrape only the numbers, is there anything other than scrapy that would do it? If not, would you give me a snippet on how to do it, just because their website is down so i can't reach the docs.

推荐答案

首先,scrapy文档可在的https:/ /scrapy.readthedocs.org/en/latest/

First of all, scrapy docs are available at https://scrapy.readthedocs.org/en/latest/.

在谈到处理AJAX,而网页抓取。基本上,这个想法很简单:

Speaking about handling ajax while web scraping. Basically, the idea is rather simple:

  • open browser developer tools, network tab
  • go to the target site
  • click submit button and see what XHR request is going to the server
  • simulate this XHR request in your spider

另请参见:

  • Can scrapy be used to scrape dynamic content from websites that are using AJAX?
  • Pagination using scrapy

希望有所帮助。

这篇关于刮使用python AJAX页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆