使用PHP删除除5个预定义HTML实体之外的所有实体的最佳方法 - 用于XHTML5输出 [英] Best way to remove all but the 5 predefined HTML entities with PHP - for XHTML5 output
问题描述
我目前正在尝试提供XHTML5。目前我在我正在处理的页面上提供XHTML 1.1 Strict。这是我为有能力的浏览器做的。对于那些不接受XML编码数据的人,我会回到严格的HTML4.1。
I'm currently experimenting with delivering XHTML5. Currently I deliver XHTML 1.1 Strict on the page I'm working on. That is I do for capable browsers. For those who don't accept XML encoded data I fall back to HTML4.1 strict.
在尝试使用HTML5时,当作为HTML5交付时,所有工作都更多或者少于预期。然而,我作为XHTML5交付的第一个问题是HTML实体。 FF4 sais & uuml;
是未定义的实体。因为没有HTML5 DTD。
In experimenting with using HTML5 for either, when delivering as HTML5 all works more or less as expected. The first issue I have when delivering as XHTML5 however is with the HTML entities. FF4 sais ü
is an undefined entity. Because there is no HTML5 DTD.
我读到了HTML5 wiki 目前建议:
I read that the HTML5 wiki currently recommends:
不要在XHTML中使用实体引用(5个预定义实体除外:
& amp;
,& lt;
,& gt;
,& quot;
和& an;
)
我需要& lt;
,& gt;
在某些地方。因此,我的问题是PHP解码除上述五个实体之外的所有实体的最佳方法。 html_entity_decode()
解码所有这些,所以有合理的方式排除一些吗?
I do need <
, >
at certain places. Hence my Question is what is the best way in PHP to decode all but the five entities named above. html_entity_decode()
decodes all of them, so is there a reasonable way to exclude some?
UPDATE :
我现在采用简单的替换/替换方法,所以除非真的有一种优雅的方式,否则我的问题就足够了需要。
I went with a simple replace / replace back approach for the moment, so unless there really is an elegant way the question is solved enough for my immediate needs.
function non_html5_entity_decode($string)
{
$string = str_replace("&",'@@@AMP',
str_replace("'",'@@@APOS',
str_replace("<",'@@@LT',
str_replace(">",'@@@GT',
str_replace(""",'@@@QUOT',$string)))));
$string = html_entity_decode($string);
$string = str_replace('@@@AMP',"&",
str_replace('@@@APOS',"'",
str_replace('@@@LT',"<",
str_replace('@@@GT',">",
str_replace('@@@QUOT',""",$string)))));
return $string;
}
推荐答案
付费注意关于通用转换:使用 html_entity_decode
,默认参数不删除所有命名实体,只有少数由旧的HTML 4.01标准定义。因此,& copy;
(©)等实体将被转换;但有些像& plus;
(+),不是。要转换所有命名实体,请在第二个参数(!)中使用ENT_HTML5。
PAY ATTENTION on universal convertions: the use of html_entity_decode
with default parameters not remove all named entities, only the few defined by old HTML 4.01 standard. So entities like ©
(©) will by converted; but some like +
(+), not. To convert ALL named entities use the ENT_HTML5 in the second parameter (!).
此外,如果目标编码不是UTF8,则无法重新接收上级(至255)名称,如& Ascr;
(𝒜)thar是119964> 255。
Also, if destination encode not is UTF8, can not recive the superior (to 255) names, like 𝒜
(𝒜) thar is 119964>255.
所以,到转换所有可能的名称实体,你必须使用 html_entity_decode($ s,ENT_HTML5,'UTF-8')
但它仅对PHP5.3有效+,执行标志ENT_HTML5。
So, to convert "ALL POSSIBLE NAMED ENTITIES", you MUST use html_entity_decode($s,ENT_HTML5,'UTF-8')
but it is valid only with PHP5.3+, where the flag ENT_HTML5 was implemented.
在这个问题的特定情况下,必须使用标志ENT_NOQUOTES而不是默认的ENT_COMPAT,因此,必须使用 html_entity_decode($ s,ENT_HTML5 | ENT_NOQUOTES,'UTF-8')
In the particular case of this question, must use also flag ENT_NOQUOTES instead the default ENT_COMPAT, so , must use html_entity_decode($s,ENT_HTML5|ENT_NOQUOTES,'UTF-8')
PS(编辑):感谢@BoltClock记住PHP5.3 +。
PS (edited): thanks to @BoltClock to remember about PHP5.3+.
这篇关于使用PHP删除除5个预定义HTML实体之外的所有实体的最佳方法 - 用于XHTML5输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!