使用python抓取ajax页面 [英] Scraping ajax pages using python

查看:39
本文介绍了使用python抓取ajax页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看过这个关于抓取ajax的问题​​,但那里没有提到python.我考虑过使用 scrapy,我相信他们有关于该主题的一些文档,但正如您所看到的,该网站已关闭.所以我不知道该怎么办.我想做以下事情:

I've already seen this question about scraping ajax, but python isn't mentioned there. I considered using scrapy, i believe they have some docs on that subject, but as you can see the website is down. So i don't know what to do. I want to do the following:

我只有一个 url,example.com,您可以通过单击提交从一个页面转到另一个页面,该 url 不会更改,因为他们使用 ajax 来显示内容.我想抓取每个页面的内容,怎么做?

I only have one url, example.com you go from page to page by clicking submit, the url doesn't change since they're using ajax to display the content. I want to scrape the content of each page, how to do it?

假设我只想抓取数字,除了scrapy之外还有什么可以做到的吗?如果没有,你能给我一个关于如何做的片段,只是因为他们的网站已经关闭,所以我无法访问文档.

Lets say that i want to scrape only the numbers, is there anything other than scrapy that would do it? If not, would you give me a snippet on how to do it, just because their website is down so i can't reach the docs.

推荐答案

首先,scrapy 文档可在 https://scrapy.readthedocs.org/en/latest/.

First of all, scrapy docs are available at https://scrapy.readthedocs.org/en/latest/.

谈到在网页抓取时处理 ajax.基本上,这个想法相当简单:

Speaking about handling ajax while web scraping. Basically, the idea is rather simple:

  • 打开浏览器开发者工具,网络标签
  • 转到目标站点
  • 点击提交按钮,看看 XHR 请求 会做什么服务器
  • 在你的蜘蛛中模拟这个 XHR 请求
  • open browser developer tools, network tab
  • go to the target site
  • click submit button and see what XHR request is going to the server
  • simulate this XHR request in your spider

另见:

希望有所帮助.

这篇关于使用python抓取ajax页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆