javascript转换后如何获取html源代码? [英] How to get html source code after javascript transformation?

查看:97
本文介绍了javascript转换后如何获取html源代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于一个学校的项目,我正在尝试创建一个网站,该网站可以比现在更漂亮的方式显示您的成绩. 我已经能够使用cURL登录到站点,现在我想以字符串形式获取成绩,以便可以使用PHP对其进行编辑. 唯一的问题是,当cURL未被获得成绩的javascript编辑时,它会获取html源代码.

for a project at school I am trying to make a website that can show your grades in a prettier way than it's being done now. I have been able to log in to the site using cURL and now I want to get the grades in a string so I can edit it with PHP. The only problem is that cURL gets the html source code when it hasn't been edited by the javascript that gets the grades.

所以基本上,我想要当您以字符串形式打开Firebug或检查器时获得的代码,以便我可以使用php对其进行编辑.

So basically I want the code that you get when you open firebug or inspector in a string so I can edit it with php.

有人对如何执行此操作有想法吗?我看过几则帖子,说您必须等到页面加载完毕,但是我不知道如何让我的网站等待另一个第三方网站的加载.

Does anyone have an idea on how to do this? I have seen several posts that say that you have to wait till the page has loaded, but I have no clue on how to make my site wait for another third-party site to be loaded.

我等待执行的代码,我想要得到的代码是这样的:

The code that I am waiting to be executed and of which I want the result is this:

<script type="text/javascript">
    var widgetWrapper = $("#objectWrapper325");
    if (widgetWrapper[0].timer !== undefined) {
        clearTimeout( jQuery('#objectWrapper325')[0].timer );
    }
    widgetWrapper[0].timer = setTimeout( function() {
        if (widgetWrapper[0].xhr !== undefined) {
            widgetWrapper[0].xhr.abort();
        }
        widgetWrapper[0].xhr = jQuery.ajax({
            type: 'GET',
            url: "",
            data: {
                "wis_ajax": 1,
                "ajax_object": 325,
                'llnr': '105629'
            },
            success: function(d) {
                var goodWidth = widgetWrapper.width();
                widgetWrapper.html(d);
                /* update width, needed for bug with standard template */
                $("#objectWrapper325 .result__overview").css('width',goodWidth-$("#objectWrapper325         .result__subjectlabels").width());
            }
        });
    }, 500+(Math.random()*1000));
</script>

推荐答案

首先,您必须了解使用cURL获取网页与使用浏览器访问同一网页之间的细微但非常重要的区别.

First you have to understand a subtle but very important difference between using cURL to get a webpage, and using your browser visiting that same page.

当您在位置栏上输入地址时,浏览器会将URL转换为ip地址.然后,它尝试使用该地址访问 Web服务器,以请求网页.从现在开始,浏览器将仅与网络服务器使用 HTTP . HTTP是用于通过网络承载文档的协议.浏览器实际上是在向Web服务器索要html文档(一堆文本). Web服务器通过将网页发送到浏览器进行回答.如果网页是静态页面,则网络服务器只是选择一个html文件并通过网络发送它.如果它是动态页面,则Web服务器使用一些高级代码(例如php)来生成网页,然后将其发送过来.

When you enter the address on the location bar, the browser converts the url into an ip address . Then it tries to reach the web server with that address asking for a web page. From now on the browser will only speak HTTP with the web server. HTTP is a protocol made for carrying documents over network. The browser is actually asking for an html document (A bunch of text) from the web server. The web server answers by sending the web page to the browser. If the web page is a static page, the web server is just picking an html file and sending it over network. If it's a dynamic page, the web server use some high level code (like php) to generate to the web page then send it over.

下载完网页后,浏览器将解析该页面并解释其中的html,从而在浏览器上生成实际的网页.在解析过程中,当浏览器找到script标记时,它将把它们的内容解释为javascript,这是浏览器中用来操纵网页外观并在浏览器内部执行操作的语言.

Once the web page has been downloaded, the browser will then parse the page and interprets the html inside which produces the actual web page on the browser. During the parsing process, when the browser finds script tags it will interpret their content as javascript, which is a language used in browser to manipulate the look of the web page and do stuff inside the browser.

请记住,Web服务器仅发送了一个包含html内容的网页,但他不知道javascript是什么.

Remember, the web server only sent a web page containing html content he has no clue of what's javascript.

因此,当您在浏览器中加载网页时,仅在将JavaScript下载到浏览器后才对其进行解释.

So when you load a web page on a browser the javascript is ONLY interpreted once it is downloaded on the browser.

如果您看一下curl手册页,您将了解curl是一种用于从/向服务器传输数据的工具,该服务器可以说一些受支持的协议,而HTTP就是其中之一. 当您下载带有curl的页面时,它将尝试以与浏览器相同的方式下载该页面,但不会解析或解释任何内容. cURL不了解javascript或html,它只知道如何与Web服务器对话.

If you take a look at curl man page, you'll learn that curl is a tool to transfer data from/to servers which can speak some supported protocols and HTTP is one of them. When you download a page with curl, it will try to download the page the same way your browser does it but will not parse or interpret anything. cURL does not understand javascript or html, all it knows about is how to speak to web servers.

因此,您所需要的就是像cURL一样下载页面,并以某种方式使javascript像在浏览器中一样被解释.

So what you need in your case is to download the page like cURL does it and also somehow make the javascript to be interpreted as if it was inside a browser.

如果您已经把我带到这里,那么您就可以看看 CasperJS

If you had follwed me up to here then you're ready to take a look at CasperJS.

这篇关于javascript转换后如何获取html源代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆