刮使用python AJAX页面 [英] Scraping ajax pages using python
问题描述
我已经看到这个问题大约刮阿贾克斯,但蟒蛇没有被提及存在。我认为使用 scrapy 的,我相信他们对这个问题的一些文件,但你可以看到该网站已关闭。所以我不知道该怎么办。我要做到以下几点:
I've already seen this question about scraping ajax, but python isn't mentioned there. I considered using scrapy, i believe they have some docs on that subject, but as you can see the website is down. So i don't know what to do. I want to do the following:
我只有一个网址,example.com你从网页点击提交,该URL,因为他们正在使用AJAX来显示内容不会改变进入页面。我想刮每一页的内容,该怎么办呢?
I only have one url, example.com you go from page to page by clicking submit, the url doesn't change since they're using ajax to display the content. I want to scrape the content of each page, how to do it?
让我们说,我想刮只有数字,还有什么比scrapy等,将做到这一点?如果没有,你给我如何做一个片段,只是因为他们的网站已关闭,所以我不能达到的文档。
Lets say that i want to scrape only the numbers, is there anything other than scrapy that would do it? If not, would you give me a snippet on how to do it, just because their website is down so i can't reach the docs.
推荐答案
首先,scrapy文档可在的https:/ /scrapy.readthedocs.org/en/latest/ 。
First of all, scrapy docs are available at https://scrapy.readthedocs.org/en/latest/.
在谈到处理AJAX,而网页抓取。基本上,这个想法很简单:
Speaking about handling ajax while web scraping. Basically, the idea is rather simple:
- 在打开的浏览器的开发工具,网络标签
- 进入目标网站
- 单击提交按钮,看看
XHR
要求一>打算在服务器 - 模拟你的蜘蛛这个
XHR
要求
- open browser developer tools, network tab
- go to the target site
- click submit button and see what
XHR
request is going to the server - simulate this
XHR
request in your spider
另请参见:
- <一个href="http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax">Can scrapy用于从正在使用AJAX技术的网站刮动态内容?
- <一个href="http://stackoverflow.com/questions/16129071/pagination-using-scrapy/16133788#16133788">Pagination使用scrapy
- Can scrapy be used to scrape dynamic content from websites that are using AJAX?
- Pagination using scrapy
希望有所帮助。
这篇关于刮使用python AJAX页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!