抓取javascript生成的网页数据 [英] Scrape web page data generated by javascript

查看:142
本文介绍了抓取javascript生成的网页数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是:如何从本网站获取数据 http://vtis.vn/index.aspx但是在您点击Danhsáchchậm之前,数据不会显示。我非常努力和仔细地尝试,当你点击Danhsáchchậm这是触发一些javascript函数的onclick事件时,其中一个js函数是从服务器获取数据并将其插入到标记/占位符中这一点你可以使用像firefox这样的东西来检查数据,是的,数据会显示给网页上的用户/查看者。再说一次,我们怎样才能以编程方式废弃这些数据?

My question is: How to scrape data from this website http://vtis.vn/index.aspx But the data is not shown until you click on for example "Danh sách chậm". I have tried very hard and carefully, when you click on "Danh sách chậm" this is onclick event which triggers some javascript functions one of the js functions is to get the data from the server and insert it to a tag/place holder and at this point you can use something like firefox to examine the data and yes, the data is display to users/viewers on the webpage. So again, how can we scrap this data programmatically?

我写了一个报废函数但是当然它没有得到我想要的数据,因为直到我点击才能得到数据按钮Danhsáchchậm

i wrote a scrapping function but ofcourse it does not get the data i want because the data is not available until i click on the button "Danh sách chậm"

                <?php
                      $Page = file_get_contents('http://vtis.vn/index.aspx');
                $dom_document = new DOMDocument();
                  $dom_document->loadHTML($Page);
                              $dom_xpath_admin = new DOMXpath($dom_document_admin);
                   $elements = $dom_xpath->query("*//td[@class='IconMenuColumn']");
                              //
                          foreach ($elements as $element) {
                            $nodes = $element->childNodes;
                            foreach ($nodes as $node) {
                                         echo (mb_convert_encoding($node->c14n(), 'iso-8859-1', mb_detect_encoding($content, 'UTF-8', true)));
                               }
                          }
                         }

谢谢你,StackOverflow是一个好地方。
D。

Thank you kindly, StackOverflow is a great place. D.

推荐答案

你需要看看 PhantomJS

从他们的网站:


PhantomJS是一款带有JavaScript API的无头WebKit。它具有各种Web标准的快速和
原生支持:DOM处理,CSS选择器,
JSON,Canvas和SVG。

PhantomJS is a headless WebKit with JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

使用API​​,您可以编写浏览器脚本以与该页面进行交互并获取所需的数据。然后你可以随心所欲地做任何事情;包括必要时将其传递给PHP脚本。

Using the API you can script the "browser" to interact with that page and scrape the data you need. You can then do whatever you need with it; including passing it to a PHP script if necessary.

如果可能的话,尽量不要刮数据。如果页面正在进行ajax调用,那么可能有一个API可以使用吗?如果没有,也许你可以说服他们做一个。这当然比屏幕抓取更容易,更易于维护。

That being said, if at all possible try not to "scrape" the data. If there is an ajax call the page is making, maybe there is an API you can use instead? If not, maybe you can convince them to make one. That would of course be much easier and more maintainable than screen scraping.

这篇关于抓取javascript生成的网页数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆