查看网站的实际源代码 [英] viewing actual source code of a website

查看:76
本文介绍了查看网站的实际源代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会用一个例子来解释我的问题。
建议我转到以下网址:
http://www.google .co.il /#q = university

然后我右键点击并选择查看源代码,我没有得到真正的html源代码,
我确信这是因为如果我在代码中搜索出现在文档中的独特单词,我将得不到任何结果。



我知道在chrome I可以标记一些内容并检查组件,然后我可以看到真实的源代码,但是我想使用一个Java程序来获取代码,所以我想了解为什么当我离开时看不到真正的html源代码的问题如果你选择查看源代码,你会看到页面的实际HTML源代码在您的地址栏中。但是,您可能想要查看的页面是通过嵌入代码来加载外部内容并将其放入您的HTML中来混淆的。



如果你仍然想自动地解析这样一个好的页面,你需要运行一个完整的HTML解释器(比如Webkit) - 这是一个无聊的工作,原则上你正在做什么检查元素。另一种方法是在页面html中找到加载外部内容的行,然后依次加载它们。如果你幸运的话,这并不是故意混淆的,而且对于小任务很容易实现。

然而,如果你需要整个DOM结构,你应该考虑实现其中一个浏览器引擎...

I'll explain my question with an example. Suggest I go the the url: http://www.google.co.il/#q=university

and then I right click and choose "view source", I don't get the real html source, I'm sure of that because if I search in the code unique words that appear in the document I get no results.

I know that in chrome I can mark something and check the component, then I can see the real source code, but I want to use a java program for getting the code so I want to understand the issue of why I don't see the real html source when I go to 'view source'.

解决方案

Well, if you select "view source" you see the actual HTML source code of the page in your address bar. However, it might be that the page(s) you want to view are "obfuscated" by having embedded code which loads external content and puts it in your HTML.

If you still want to automatically parse such a page in a "nice" you need to run a whole HTML interpreter like for example Webkit - a hell of work, and in principle what you are doing with "inspect element". The other way is that you find the lines in the page-html that load the external contents and then in turn load them on your own. If you are lucky this is not obfuscated on purpose and kind of easy to achive for small tasks.

However, if you need the whole DOM structure, you should think about implementing one of the browser engines...

这篇关于查看网站的实际源代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆