使用python抓取ajax页面 [英] Scraping ajax pages using python
问题描述
我已经看过这个关于抓取ajax的问题,但那里没有提到python.我考虑过使用 scrapy,我相信他们有关于该主题的一些文档,但正如您所看到的,该网站已关闭.所以我不知道该怎么办.我想做以下事情:
I've already seen this question about scraping ajax, but python isn't mentioned there. I considered using scrapy, i believe they have some docs on that subject, but as you can see the website is down. So i don't know what to do. I want to do the following:
我只有一个 url,example.com,您可以通过单击提交从一个页面转到另一个页面,该 url 不会更改,因为他们使用 ajax 来显示内容.我想抓取每个页面的内容,怎么做?
I only have one url, example.com you go from page to page by clicking submit, the url doesn't change since they're using ajax to display the content. I want to scrape the content of each page, how to do it?
假设我只想抓取数字,除了scrapy之外还有什么可以做到的吗?如果没有,你能给我一个关于如何做的片段,只是因为他们的网站已经关闭,所以我无法访问文档.
Lets say that i want to scrape only the numbers, is there anything other than scrapy that would do it? If not, would you give me a snippet on how to do it, just because their website is down so i can't reach the docs.
推荐答案
首先,scrapy 文档可在 https://scrapy.readthedocs.org/en/latest/.
First of all, scrapy docs are available at https://scrapy.readthedocs.org/en/latest/.
谈到在网页抓取时处理 ajax.基本上,这个想法相当简单:
Speaking about handling ajax while web scraping. Basically, the idea is rather simple:
- 打开浏览器开发者工具,网络标签
- 转到目标站点
- 点击提交按钮,看看
XHR
请求 会做什么服务器 - 在你的蜘蛛中模拟这个
XHR
请求
- open browser developer tools, network tab
- go to the target site
- click submit button and see what
XHR
request is going to the server - simulate this
XHR
request in your spider
另见:
希望有所帮助.
这篇关于使用python抓取ajax页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!