scrapy 无法抓取页面中的所有链接 [英] scrapy can't crawl all links in a page
问题描述
我正在尝试 scrapy 抓取 ajax 网站 http://play.google.com/store/apps/category/GAME/collection/top sell_new_free
I am trying scrapy to crawl a ajax website http://play.google.com/store/apps/category/GAME/collection/topselling_new_free
我想获得指向每个游戏的所有链接.
I want to get all the links directing to each game.
我检查页面的元素.它看起来像这样:页面的样子所以我想提取模式/store/apps/details?id=
I inspect the element of the page. And it looks like this: how the page looks like so I want to extract all links with the pattern /store/apps/details?id=
但是当我在 shell 中运行命令时,它什么都不返回:shell 命令
but when I ran commands in the shell, it returns nothing: shell command
我也试过//a/@href.也没有解决,但不知道发生了什么问题....
I've also tried //a/@href. didn't work out either but Don't know what is wrong going on....
- 现在我可以抓取前 120 个链接,并按照有人告诉我的那样修改了 starturl 并添加了formdata",但之后就没有更多链接了.
有人可以帮我吗?
推荐答案
它实际上是一个 ajax-post-request
填充该页面上的数据.在scrapy shell中,你不会得到这个,而不是检查元素检查network
标签,你会在那里找到请求.
It's actually an ajax-post-request
which populates the data on that page. In scrapy shell, you won't get this, instead of inspect element check the network
tab there you will find the request.
向 https://play.google.com/store/apps/category/GAME/collection/top sell_new_free?authuser=0
网址发出帖子请求formdata={'start':'0','num':'60','numChildren':'0','ipf':'1','xhr':'1'}
Make post request to https://play.google.com/store/apps/category/GAME/collection/topselling_new_free?authuser=0
url with
formdata={'start':'0','num':'60','numChildren':'0','ipf':'1','xhr':'1'}
每次请求以 60 开始递增以获得分页结果.
Increment start by 60 on each request to get the paginated result.
这篇关于scrapy 无法抓取页面中的所有链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!