使用PHP删除除5个预定义HTML实体之外的所有实体的最佳方法 - 用于XHTML5输出 [英] Best way to remove all but the 5 predefined HTML entities with PHP - for XHTML5 output

查看:108
本文介绍了使用PHP删除除5个预定义HTML实体之外的所有实体的最佳方法 - 用于XHTML5输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试提供XHTML5。目前我在我正在处理的页面上提供XHTML 1.1 Strict。这是我为有能力的浏览器做的。对于那些不接受XML编码数据的人,我会回到严格的HTML4.1。

I'm currently experimenting with delivering XHTML5. Currently I deliver XHTML 1.1 Strict on the page I'm working on. That is I do for capable browsers. For those who don't accept XML encoded data I fall back to HTML4.1 strict.

在尝试使用HTML5时,当作为HTML5交付时,所有工作都更多或者少于预期。然而,我作为XHTML5交付的第一个问题是HTML实体。 FF4 sais & uuml; 是未定义的实体。因为没有HTML5 DTD。

In experimenting with using HTML5 for either, when delivering as HTML5 all works more or less as expected. The first issue I have when delivering as XHTML5 however is with the HTML entities. FF4 sais ü is an undefined entity. Because there is no HTML5 DTD.

我读到了HTML5 wiki 目前建议:

I read that the HTML5 wiki currently recommends:


不要在XHTML中使用实体引用(5个预定义实体除外:& amp; & lt; & gt; & quot; & an;

我需要& lt; & gt; 在某些地方。因此,我的问题是PHP解码除上述五个实体之外的所有实体的最佳方法。 html_entity_decode()解码所有这些,所以有合理的方式排除一些吗?

I do need <, > at certain places. Hence my Question is what is the best way in PHP to decode all but the five entities named above. html_entity_decode() decodes all of them, so is there a reasonable way to exclude some?

UPDATE :

我现在采用简单的替换/替换方法,所以除非真的有一种优雅的方式,否则我的问题就足够了需要。

I went with a simple replace / replace back approach for the moment, so unless there really is an elegant way the question is solved enough for my immediate needs.

function non_html5_entity_decode($string)
{
    $string = str_replace("&",'@@@AMP',
                        str_replace("'",'@@@APOS',
                        str_replace("<",'@@@LT',
                        str_replace(">",'@@@GT',
                        str_replace(""",'@@@QUOT',$string)))));
    $string = html_entity_decode($string);
    $string = str_replace('@@@AMP',"&",
                        str_replace('@@@APOS',"'",
                        str_replace('@@@LT',"<",
                        str_replace('@@@GT',">",
                        str_replace('@@@QUOT',""",$string)))));
    return $string;
}


推荐答案

付费注意关于通用转换:使用 html_entity_decode ,默认参数不删除所有命名实体,只有少数由旧的HTML 4.01标准定义。因此,& copy; (©)等实体将被转换;但有些像& plus; (+),不是。要转换所有命名实体,请在第二个参数(!)中使用ENT_HTML5。

PAY ATTENTION on universal convertions: the use of html_entity_decode with default parameters not remove all named entities, only the few defined by old HTML 4.01 standard. So entities like ©(©) will by converted; but some like +(+), not. To convert ALL named entities use the ENT_HTML5 in the second parameter (!).

此外,如果目标编码不是UTF8,则无法重新接收上级(至255)名称,如& Ascr; (𝒜)thar是119964> 255。

Also, if destination encode not is UTF8, can not recive the superior (to 255) names, like 𝒜(𝒜) thar is 119964>255.

所以,到转换所有可能的名称实体,你必须使用 html_entity_decode($ s,ENT_HTML5,'UTF-8') 但它仅对PHP5.3有效+,执行标志ENT_HTML5。

So, to convert "ALL POSSIBLE NAMED ENTITIES", you MUST use html_entity_decode($s,ENT_HTML5,'UTF-8') but it is valid only with PHP5.3+, where the flag ENT_HTML5 was implemented.

在这个问题的特定情况下,必须使用标志ENT_NOQUOTES而不是默认的ENT_COMPAT,因此,必须使用 html_entity_decode($ s,ENT_HTML5 | ENT_NOQUOTES,'UTF-8')

In the particular case of this question, must use also flag ENT_NOQUOTES instead the default ENT_COMPAT, so , must use html_entity_decode($s,ENT_HTML5|ENT_NOQUOTES,'UTF-8')

PS(编辑):感谢@BoltClock记住PHP5.3 +。

PS (edited): thanks to @BoltClock to remember about PHP5.3+.

这篇关于使用PHP删除除5个预定义HTML实体之外的所有实体的最佳方法 - 用于XHTML5输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆