如何使用PHP,CURL抓取javascript网站 [英] How to scrape a javascript site using PHP, CURL
问题描述
这是网站 http: //www.oferta.pl/strona_v2/gazeta_v2/ 。这个网站完全基于JavaScript。我想刮使用PHP和curl。目前我使用DOMXPath。在左侧菜单中有一些要选择的类别。我看不到形式。如何使用curl提交表单并截取输出页面?
This is the site http://www.oferta.pl/strona_v2/gazeta_v2/ . This site is built totally on JavaScript. I want to scrape using PHP and curl. Currently I use DOMXPath. In the left menu there are some category to be selected. I see no 'form' there. How can I use curl to submit that form and scrap the output page?
我只使用了file_get_contents()。它不会得到所有的页面。如何进行?
I have used file_get_contents() only. It doesn't get all of the page. How can I proceed?
注意: http://www.html-form-guide.com/php-form/php-form-submit.html 我发现这个例子有一个'form'。但是我指定的网站没有form。
N.B : http://www.html-form-guide.com/php-form/php-form-submit.html I have found this example which have a 'form'. But my specified site has no 'form'.
推荐答案
这是可能的。但它的方式太难了。
You can not scrape it. Its possible. But its way too hard.
-
通过curl模拟http请求。
Simulate the http request by curl. Check every request it makes by ajax and try to simulate it.
模拟Javascript执行(这部分几乎是不可能的)。某些请求包含由Javascript生成的值。你需要在PHP中做。如果他们在JS中实现了一些复杂的算法,可以调用 v8
javascript引擎。
Simulate Javascript executions (this part is almost impossible). Some requests contains values which are generated by Javascript. You need to do it in php. If they has some complicated algorithm implemented in JS you can invoke v8
javascript engine.
这篇关于如何使用PHP,CURL抓取javascript网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!