当我需要转义Html字符串? [英] When i need to escape Html string?

查看:140
本文介绍了当我需要转义Html字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  StringEscapeUtils.escapeHtml(stringBody)在我的遗留项目中,我可以看到使用escapeHtml ); 

我从api doc知道escapeHtml是什么。这里给出的示例: -

 例如:
bread& 黄油
成为:
& quot; bread& quot; &放大器;放大器; &安培; QUOT;黄油&安培; QUOT ;.

我的理解是当我们发送字符串转义html后,浏览器的责任转换
回到原来的角色。这是对的吗?



但是我没有得到为什么,何时需要,如果我们发送字符串正文而不转移html会发生什么?如果我们在将其发送到浏览器之前不要执行escapeHtml,那么它的成本是多少?

解决方案

我可以想到几种可能性解释为什么有时候不会转义字符串:




  • 也许原始的程序员相信在某些地方字符串没有特殊字符(但是,在我看来,这将是糟糕的编程实践;它避免了一个字符串,以防止未来的变化)

  • 该代码中的字符串已经被转义了。你绝对不想逃脱一个字符串两次;用户将最终看到转义序列而不是预期的文本。

  • 字符串是实际的html本身。你不想逃避html;你希望浏览器处理它!



编辑 -
转义的原因是像& 可能会导致浏览器显示不同于您的意图。一个裸露的& 在技术上是一个HTML中的错误。大多数浏览器尝试智能处理这些错误,并在大多数情况下正确显示它们。 (例如,如果字符串是< div> 中的文本,那么这个几乎肯定会发生在你的示例文本中。)然而,因为它是不好的标记,一些浏览器将工作不好辅助技术(例如,文字到语音)可能失败;并且可能还有其他问题。



尽管浏览器尽可能的努力从坏的标记中恢复,但有几种情况会失败。如果您的示例字符串是属性值,则绝对需要转义引号。浏览器无法正确处理如下所示:

 < img alt =bread&黄油...> 

一般规则是任何不标记但可能会被误认为标记的字符都需要转义



请注意,文本可能会出现在html文档中的几个上下文,并且它们具有单独的转义要求。在属性值中,您需要转义引号和&符号(而不是< )。您必须转义在文档的字符集中没有表示形式的字符(不太可能,如果您使用的是UTF-8,但并不总是如此)。在文本节点中,只需要转义& 。在href值中,需要转义在url中的字符必须被转义(有时会被双重转义,所以在浏览器将其解除一次之后仍然被转义)。在CDATA块中,通常不应该转义(在HTML级别)。



最后,除了双重转义的危险之外,转义所有文本的成本是最小:一点额外的处理和网络上的几个额外的字节。


In my legacy project i can see the usage of escapeHtml before string is sent to browser.

StringEscapeUtils.escapeHtml(stringBody);

I know from api doc what escapeHtml does.here is the example given:-

For example: 
"bread" & "butter"
becomes: 
&quot;bread&quot; &amp; &quot;butter&quot;.

My understanding is when we send the string after escaping html its the browser responsibility that converts back to original characters. Is that right?

But i am not getting why and when it is required and what happens if we send the string body without escaping html? what is the cost if we dont do escapeHtml before sending it to browser

解决方案

I can think of several possibilities to explain why sometimes a string is not escaped:

  • perhaps the original programmer was confident that at certain places the string had no special characters (however, in my opinion this would be bad programming practice; it costs very little to escape a string as protection against future changes)
  • the string was already escaped at that point in the code. You definitely don't want to escape a string twice; the user will end up seeing the escape sequence instead of the intended text.
  • The string was the actual html itself. You don't want to escape the html; you want the browser to process it!

EDIT - The reason for escaping is that special characters like & and < can end up causing the browser to display something other than what you intended. A bare & is technically an error in the html. Most browsers try to deal intelligently with such errors and will display them correctly in most cases. (This will almost certainly happen in your example text if the string were text in a <div>, for instance.) However, because it is bad markup, some browsers will not work well; assistive technologies (e.g., text-to-speech) may fail; and there may be other problems.

There are several cases that will fail despite the best efforts of the browser to recover from bad markup. If your sample string were an attribute value, escaping the quote marks would be absolutely required. There's no way that a browser is going to correctly handle something like:

<img alt=""bread" & "butter"" ... >

The general rule is that any character that is not markup but might be confused as markup need to be escaped.

Note that there are several contexts in which text can appear within an html document, and they have separate requirements for escaping. Within attribute values, you need to escape quote marks and the ampersand (but not <). You must escape characters that have no representation in the character set of the document (unlikely if you are using UTF-8, but that's not always the case). Within text nodes, only & and < need to be escaped. Within href values, characters that need escaping in a url must be escaped (and sometimes doubly escaped so they are still escaped after the browser unescapes them once). Within a CDATA block, generally nothing should be escaped (at the HTML level).

Finally, aside from the hazard of double-escaping, the cost of escaping all text is minimal: a tiny bit of extra processing and a few extra bytes on the network.

这篇关于当我需要转义Html字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆