为什么DocBook生成的XHTML5 Section标题中包含ASCII#c2字符? [英] Why do DocBook generated XHTML5 Section titles have ASCII #c2 characters in them?

查看:93
本文介绍了为什么DocBook生成的XHTML5 Section标题中包含ASCII#c2字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到生成的XHTML5编号的节标题在数字和标题字符串之间有一个Â.我以为这是世代错误.但是不,我的DocBook发行版的gentext文件common/en.xml实际上指定了这一点.

I noticed my generated XHTML5 numbered section titles have a  between the number and the title string. I thought this was a generation error. But no, the gentext file of my DocBook distribution, common/en.xml, actually specifies this.

common/en.xml的第338行:

Line 338 of common/en.xml:

<l:template name="section" text="%n. %t"/>

在十六进制编辑器中查看时,%n后面的点和空格分别是ASCII字符代码C2和A0,分别是Â和NBSP字符.我能理解NBSP.但是为什么呢?

The dot and space following the %n are, when viewed in a hex editor, ASCII character codes C2 and A0, which are the  and NBSP characters respectively. I can understand NBSP. But why the �

我知道我可以在我的自定义层中更改此设置.但是默认值似乎很奇怪.

I understand I can change this in my customization layer. But the default seems odd.

我正在使用docbook-xsl-ns-1.77.1.

I'm using docbook-xsl-ns-1.77.1.

推荐答案

这是因为编码为UTF-8,这是目前这些文本的常规Unicode编码.在UTF-8中,0x7F以上的任何字符都由2、3或4个字节的序列表示,具体取决于其包含的有效代码位的数量.

That is because the encoding is UTF-8, which is the normal Unicode encoding for text these days. In UTF-8, any character above 0x7F is represented by a sequence of 2, 3, or 4 bytes depending on how many significant code bits it contains.

0xC2是开始2字节序列的字符之一.二进制格式为11000010.两个1位表示2字符序列,而后五个位是编码字符的前五个.第二个是0xA0,是10010000.单个前导1位(后跟0位)表示序列的延续,而后6位是编码字符的低位.

The 0xC2 is one of the chars that starts a 2-byte sequence. In binary, it's 1100 0010. The two 1 bits denote a 2-char sequence, and the bottom five bits are the first five of the encoded character. The second one, 0xA0, is 1001 0000. The single leading 1 bit (followed by a 0 bit) denotes a continuation of the sequence, and the bottom 6 bits are the bottom bits of the encoded character.

将第一个字节的后五位与第二个字节的后六位放在一起,得出十六进制的U + A0为000 1001 0000,这确实是不间断的空间.

Putting the bottom five bits from the first byte together with the bottom six bits from the second, we get 000 1001 0000, in hex U+A0, which is indeed the nonbreaking space.

这篇关于为什么DocBook生成的XHTML5 Section标题中包含ASCII#c2字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆