获取HTML源代码 [英] get html source code

查看:124
本文介绍了获取HTML源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Javascript将任何网页的html源代码放在字符串中。
请告诉我是否可以做其他事情来解决我的问题..
我使用以下代码从另一个发布

I am trying to put the html source code for any webpage in a string using Javascript. Please tell me if i can do something else to solve my problem.. I am using the following code that i found from another post

function httpGet(theUrl)
{
var xmlHttp = null;

xmlHttp = new XMLHttpRequest();
xmlHttp.open( "GET", theUrl, false );
xmlHttp.send( null );
return xmlHttp.responseText;
}

我在IE浏览器和Chrome中尝试了这个但是我总是得到以下源代码这是PAGE NOT FOUND页面的源代码。如果您有任何其他信息,请在评论中告诉我..
我想要的是从任何网页获取HTML,如google.com和其他网页..如果我不能那样做那么我该怎么办?

I tried this in IE Firefox and Chrome but i always get the following source code which is the source code for "PAGE NOT FOUND" page..If you any other info please let me know in a comment.. What i am trying is to get html from any webpage like google.com and other webpages..If i can't do that then what can i do?

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head profile="http://gmpg.org/xfn/11">
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>404 - PAGE NOT FOUND</title>
            <style type="text/css">
            body{padding:0;margin:0;font-family:helvetica;}
            #container{margin:20px auto;width:868px;}
            #container #top404{background-image:url('http://74.53.143.237/images/404top.gif');background-repeat:no-repeat;width:868px;height:168px;}
            #container #mid404{background-image:url('http://74.53.143.237/images/404mid.gif');background-repeat:repeat-y;width:868px;}
            #container #mid404 #gatorbottom{position:relative;left:39px;float:left;}
            #container #mid404 #xxx{float:left;padding:40px 237px 10px;}
            #container #mid404 #content{float:left;text-align:center;width:868px;}
            #container #mid404 #content #errorcode{font-size:30px;font-weight:800;}
            #container #mid404 #content p{font-weight:800;}
            #container #mid404 #content #banner{margin:20px 0 0 ;}
            #container #mid404 #content #hostedby{font-weight:800;font-size:25px;font-style:italic;margin:20px 0 0;}
            #container #mid404 #content #coupon{color:#AB0000;font-size:22px;font-style:italic;}
            #container #mid404 #content #getstarted a{color:#AB0000;font-size:31px;font-style:italic;font-weight:800;}
            #container #mid404 #content #getstarted {margin:0 0 35px;}
            #container #bottom404{background-image:url('http://74.53.143.237/images/404bottom.gif');background-repeat:no-repeat;width:868px;height:14px;}
            </style>
</head>
<body>
<div id="container">
    <div id="top404"></div>
    <div id="mid404">

            <div id="gatorbottom"><img src="http://74.53.143.237/images/gatorbottom.png" alt="" /></div>
            <div id="xxx"><img src="http://74.53.143.237/images/x.png" alt="" /></div>
    <div id="content">
            <div id="errorcode">ERROR 404 - PAGE NOT FOUND</div>
            <p>Oops! Looks like the page you're looking for was moved or never existed.<br />Make sure you typed the correct URL or followed a valid link.</p>

            <div id="banner">

                    <object width="728" height="90"><param name="movie" value="http://74.53.143.237/images/hg728x90.swf">

                            <embed src="http://74.53.143.237/images/hg728x90.swf?clickTAG=http://secure.hostgator.com/cgi-bin/affiliates/clickthru.cgi?id=page404" width="728" height="90"></embed>
                    </object>
            </div>

            <div id="hostedby">This site is hosted by HostGator!</div>
            <div id="coupon">Build your website today for 1 cent!   Coupon code: "404PAGE"</div>

            <div id="getstarted"><a href="http://www.hostgator.com/?utm_source=internal&utm_medium=link&utm_campaign=page404" title="HostGator Web Hosting" >CLICK HERE TO GET STARTED</a></div>

    </div>

    <div style="clear:left;"></div>
    </div>
    <div id="bottom404"></div>
</div>

</body>

</html>


推荐答案


我正在尝试使用Javascript将任何网页的html源代码放在字符串中

I am trying to put the html source code for any webpage in a string using Javascript

如果用any表示来自原点以外的其他页面的页面您的文档是由您提供的,您无法通过JavaScript 在浏览器中运行来实现,因为您正在使用ajax调用,而这些调用受到同源策略,表示(例如)在 http://stackoverflow.com <上的文档中运行的脚本/ code>无法使用ajax从 http://example.com 加载内容。 (起源不仅仅是域名,还有几个方面,详见链接)。

If by "any" you mean pages from origins other than the origin your document is served from, you can't do that from JavaScript running in a browser, because you're using an ajax call and those are restricted by the Same Origin Policy, which says that (for instance) script running in a document on http://stackoverflow.com can't use ajax to load content from http://example.com. (An "origin" is more than just the domain name, there are several aspects to it, see the link for details).

您可能要求的一些页面(但可能 非常 很少)可能支持 Cross -Origin资源共享,在这种情况下,如果他们允许你的来源(可能允许所有来源),你可以使用ajax加载他们的内容。

Some of the pages you might request (but probably very few) might support Cross-Origin Resource Sharing, in which case if they allow your origin (probably by allowing all origins), you could use ajax to load their content.

如果你在浏览器外面运行JavaScript (NodeJS,SilkJS,RingoJS,Rhino,Windows Scripting Host等),那么SOP将不适用,但我怀疑你可能需要使用要做的事情不是 XMLHttpRequest 对象。

If you're running JavaScript outside the browser (NodeJS, SilkJS, RingoJS, Rhino, Windows Scripting Host, etc.), then the SOP wouldn't apply, but I suspect you'd probably need to use something other than the XMLHttpRequest object to do it.

但从根本上说,在网页中(不是扩展名/在浏览器中,你不能这样做。

But fundamentally, in a web page (not an extension/add-on) in a browser, you can't do that.


...但我总是得到...源代码PAGE NOT FOUND页面

...but i always get the ... source code for "PAGE NOT FOUND" page

所以像URL一样错误。

But that sounds like the URL is just wrong.

这篇关于获取HTML源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆