如何通过JavaScript获取静态的原始HTML源代码? [英] How to get the static, original HTML source via JavaScript?

查看:90
本文介绍了如何通过JavaScript获取静态的原始HTML源代码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在开发一个工具时(我不认为这里有重要的细节,在这个问题上,鉴于我能够开发 MCVE 在下面),我注意到,至少在我桌面上的Chrome和Firefox版本中,我从 innerHTML 属性得到的字符串不是等于我在HTML文件上静态编写的原始源代码。

While developing a tool (which I don't consider important detailing here, on the question, given that I was able to develop the MCVE's below), I noticed that, at least in the Chrome and Firefox versions that I have on my desktop, the string I get from the innerHTML attribute is not equal to the original source code I wrote statically on the HTML file.

console.log(document.querySelector("div").innerHTML);
/*
  <table>
    <tbody><tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </tbody></table>
*/

<div>
  <table>
    <tr>
      <td>Hello</td>
      <td>World</td>
    </tr>
  </table>
</div>

正如你所说注意到,一个自发的< tbody> 标签(我已经添加到我的HTML源代码中!)出来了,显然是由于预处理了一些页面下载和页面 onload事件之间的时间。在这种特殊情况下,出于我的应用目的,这种修改不会产生错误,因此可以忽略。

As you may have noticed, a spontaneous <tbody> tag (which I have not added to my HTML source!) came out, aparently due to preprocessing some time in between the page download and the page onload event. In this particular case, for my application purposes, this modification doesn't generate an error and could thus be ignored.

事实证明,在某些情况下,这种类型的更改可能是灾难性的,特别是当所有标记被删除时,如下例所示。

Turns out that, in certain cases, this sort of alteration can be catastrophic, specially when all the markup is removed, like in the example below.

console.log(document.querySelector("div").innerHTML);
/*
  Hello
  World
*/

<div>
  <td>Hello</td>
  <td>World</td>
</div>

显然,在这种情况下,原始标记问题,但在我的应用程序中,误用(如< td> 在<接受code>< div> )。什么接受的是 innerHTML 完全没有HTML标记,这导致了一个主要问题:我怎样才能得到< div> 元素的原始静态编码HTML标记?

Obviously, in this case the original markup has issues, but in my application, "misuses" (like a <td> inside a <div>) are accepted. What is not accepted is the innerHTML being left with no HTML markup at all, which leads to the main question: how can I get the original, statically coded HTML markup for the <div> element?

此外,如果可能,也很高兴知道为什么以及如何发生这种现象,因为我很好奇:D

Also, if possible, it would also be nice to know why and how this phenomenon occurs, because I'm curious :D

推荐答案

浏览器下载HTML源并将其解析为DOM(文档对象模型)。任何问题都尽可能地修复,并且可以在DOM中添加源中可以省略的元素。

The browser downloads the HTML source and parses it into a DOM (document object model). Any issues are fixed as good as possible, and elements that can be omitted in the source might be added in the DOM.

从那一刻开始,这个内存结构用于呈现页面,这就是你在JavaScript中引用的结构。因此,如果您请求元素的innerHTML,您只需获得一段基于DOM呈现的HTML源代码。 JavaScript中根本不提供原始源代码。

From that moment on, this memory structure is used to render the page, and it is this structure as well what you refer to in JavaScript. So if you request the innerHTML of an element, you just get a piece of HTML source code that is rendered based on the DOM. The original source is not available at all in JavaScript.

所以,这就是它发生的原因。而且你也无能为力。我认为唯一的解决方法是使用AJAX将整个页面重新加载到字符串中并自己获取所需的源代码。

So, that's the reason why it happens. And also there is not much you can do about it. I think the only workaround is to re-load the entire page using AJAX into a string and get the required piece of source yourself.

但显然,更好的解决方案是要删除这些滥用并使您的HTML源有效。如果您只需要在页面中包含一些仅供JavaScript使用的信息,您可以选择嵌入一个脚本标记,用这些值初始化几个变量,而不是而不是生成一些无效的HTML。

But a better solution, obviously, would be to remove those "misuses" and make your HTML source valid. If you just need to enclose some information in the page to be used by JavaScript alone, you might choose to embed a script tag that initializes a couple of variables with those values, rather than generating some invalid HTML.

这篇关于如何通过JavaScript获取静态的原始HTML源代码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆