PI的内容取决于文档编码? [英] PI's content depending on document encoding?

查看:57
本文介绍了PI的内容取决于文档编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,


我只是因为我自己的理解而问这个:


处理指令的数据部分不是实体 - 意识到,即字符

和解析时尚未解决的商业实体。例如,


<?mypi& lt; par /& gt; ?>


作为数据部分传递String(!)"& lt; par /& gt;"。


这实际上意味着PI的可能字符内容受到文档编码的限制,因为数字实体不能用于表示此字符之外的字符数
编码。


因此,这意味着编写PI并使用ASCII范围之外的任何字符

在提交此类内容时会遇到麻烦。

文档(最初,例如,以UTF-8表示)到未知的XML工作流程,因为

中间阶段可能决定将文档序列化为例如ASCII

因此会丢失PI内该范围之外的任何字符。

我的理解是否正确?


问候,克里斯蒂安。

-

Christian Roth

电邮:roth(at)visualclick(dot)de

Mac.Java .Pasta.Sopranosax.Single。

Hello,

I am merely asking this for my own understanding:

Processing instruction''s data part is not entity-aware, i.e. character
and numercial entities are not resolved at parsing time. E.g.,

<?mypi &lt;par/&gt; ?>

delivers as data part the String(!) "&lt;par/&gt;".

This effectively means that the possivle character contents of a PI is
limited by the document''s encoding, since numerical entities cannot be
used to express characters outside of this encoding.

Consequently, this means that writing a PI and using any character
outside the ASCII range is bound for trouble when submitting such a
document (originally, say, in UTF-8) to an unknown XML workflow, since
intermediary stages may decide to serialize the document to e.g. ASCII
and therefore will lose any characters outside that range within PIs.
Is my understanding correct?

Regards, Christian.
--
Christian Roth
Email: roth (at) visualclick (dot) de
Mac.Java.Pasta.Sopranosax.Single.

推荐答案

文章< 1h ***************** **************@visualclick.de> ,

Christian Roth< ro ********* @ visualclick.de>写道:
In article <1h*******************************@visualclick.de> ,
Christian Roth <ro*********@visualclick.de> wrote:
这实际上意味着PI的可能角色内容受到文档编码的限制,因为数字实体不能用于表达此编码之外的字符。


是的。


(嗯,*可能*通过手段将任意字符放入PI中
$ b $实体b:


<!DOCTYPE foo [

<!ENTITY pi"<?pi这里是欧元符号:& #x20AC;?>">

]>

< foo>

& pi;

< / foo>


但这在大多数情况下都不实用。)

因此,这意味着编写PI并使用任何字符
在ASCII范围之外,在将这样的文档(最初,例如,以UTF-8格式)提交到未知的XML工作流程时会遇到麻烦,因为
中间阶段可能决定序列化文档例如ASCII
因此会在PI内失去该范围之外的任何字符。
This effectively means that the possivle character contents of a PI is
limited by the document''s encoding, since numerical entities cannot be
used to express characters outside of this encoding.
Yes.

(Well, it is *possible* to put arbitrary characters in a PI by means
of an entity:

<!DOCTYPE foo [
<!ENTITY pi "<?pi here is a euro symbol: &#x20AC; ?>">
]>
<foo>
&pi;
</foo>

but this is not practical in most circumstances.)
Consequently, this means that writing a PI and using any character
outside the ASCII range is bound for trouble when submitting such a
document (originally, say, in UTF-8) to an unknown XML workflow, since
intermediary stages may decide to serialize the document to e.g. ASCII
and therefore will lose any characters outside that range within PIs.




对于元素和属性名称同样如此,因为字符

引用也不能在那里使用。


- Richard



This is equally true for element and attribute names, since character
references cannot be used there either.

-- Richard


Richard Tobin< ri ***** @ cogsci.ed.ac.uk>写道:
Richard Tobin <ri*****@cogsci.ed.ac.uk> wrote:
因此,这意味着在提交此类数据时,编写PI并使用ASCII范围之外的任何字符都会遇到麻烦
文档(最初,以UTF-8表示)到未知的XML工作流程,因为
中间阶段可能决定将文档序列化为例如ASCII
因此会在PI内丢失该范围之外的任何字符。
Consequently, this means that writing a PI and using any character
outside the ASCII range is bound for trouble when submitting such a
document (originally, say, in UTF-8) to an unknown XML workflow, since
intermediary stages may decide to serialize the document to e.g. ASCII
and therefore will lose any characters outside that range within PIs.



对于元素和属性名称同样如此,因为字符
引用不能在那里使用或者。



This is equally true for element and attribute names, since character
references cannot be used there either.




非常感谢您的详细解答,理查德 - 非常感谢

赞赏!


你知道是否有技术原因因为没有解决(至少)PI数据中的数字实体,元素和

属性名称的

解析器(我也认为评论)在XML中?在解析过程中,这可能会产生不明确的状态吗?


问候,克里斯蒂安。


-

Christian Roth

电邮:roth(at)visualclick(dot)de

Mac.Java.Pasta.Sopranosax.Single。



Thank you very much for the detailed answer, Richard - highly
appreciated!

Do you know if there is a technical reason(ing) for not having the
parser resolve (at least) numerical entities in PI data, element and
attribute names (and I think comments as well) in XML? Would this
possibly create ambiguous states during parsing?

Regards, Christian.

--
Christian Roth
Email: roth (at) visualclick (dot) de
Mac.Java.Pasta.Sopranosax.Single.


文章< 1h ******************************* @ visualclick.de> ,

Christian Roth< ro ********* @ visualclick.de>写道:
In article <1h*******************************@visualclick.de> ,
Christian Roth <ro*********@visualclick.de> wrote:
你知道是否有技术原因(ing)因为没有解析(至少)PI数据,元素和
Do you know if there is a technical reason(ing) for not having the
parser resolve (at least) numerical entities in PI data, element and
attribute names (and I think comments as well) in XML?




所有这些都是从SGML继承的。所以我能做的最好的事情是历史

原因。我想它只是被认为不够重要。


- Richard



All this is inherited from SGML. So the best I can do is "historical
reasons". I imagine it just wasn''t considered important enough.

-- Richard


这篇关于PI的内容取决于文档编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆