为什么DocBook生成的XHTML5 Section标题中包含ASCII#c2字符? [英] Why do DocBook generated XHTML5 Section titles have ASCII #c2 characters in them?
问题描述
我注意到生成的XHTML5编号的节标题在数字和标题字符串之间有一个Â.我以为这是世代错误.但是不,我的DocBook发行版的gentext文件common/en.xml实际上指定了这一点.
I noticed my generated XHTML5 numbered section titles have a  between the number and the title string. I thought this was a generation error. But no, the gentext file of my DocBook distribution, common/en.xml, actually specifies this.
common/en.xml的第338行:
Line 338 of common/en.xml:
<l:template name="section" text="%n. %t"/>
在十六进制编辑器中查看时,%n后面的点和空格分别是ASCII字符代码C2和A0,分别是Â和NBSP字符.我能理解NBSP.但是为什么呢?
The dot and space following the %n are, when viewed in a hex editor, ASCII character codes C2 and A0, which are the  and NBSP characters respectively. I can understand NBSP. But why the �
我知道我可以在我的自定义层中更改此设置.但是默认值似乎很奇怪.
I understand I can change this in my customization layer. But the default seems odd.
我正在使用docbook-xsl-ns-1.77.1.
I'm using docbook-xsl-ns-1.77.1.
推荐答案
这是因为编码为UTF-8,这是目前这些文本的常规Unicode编码.在UTF-8中,0x7F以上的任何字符都由2、3或4个字节的序列表示,具体取决于其包含的有效代码位的数量.
That is because the encoding is UTF-8, which is the normal Unicode encoding for text these days. In UTF-8, any character above 0x7F is represented by a sequence of 2, 3, or 4 bytes depending on how many significant code bits it contains.
0xC2是开始2字节序列的字符之一.二进制格式为11000010.两个1位表示2字符序列,而后五个位是编码字符的前五个.第二个是0xA0,是10010000.单个前导1位(后跟0位)表示序列的延续,而后6位是编码字符的低位.
The 0xC2 is one of the chars that starts a 2-byte sequence. In binary, it's 1100 0010. The two 1 bits denote a 2-char sequence, and the bottom five bits are the first five of the encoded character. The second one, 0xA0, is 1001 0000. The single leading 1 bit (followed by a 0 bit) denotes a continuation of the sequence, and the bottom 6 bits are the bottom bits of the encoded character.
将第一个字节的后五位与第二个字节的后六位放在一起,得出十六进制的U + A0为000 1001 0000,这确实是不间断的空间.
Putting the bottom five bits from the first byte together with the bottom six bits from the second, we get 000 1001 0000, in hex U+A0, which is indeed the nonbreaking space.
这篇关于为什么DocBook生成的XHTML5 Section标题中包含ASCII#c2字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!