如何在 reST/Sphinx 中记录字符串中的单个空格字符? [英] how to document a single space character within a string in reST/Sphinx?
问题描述
我迷失在某种极端情况下.我正在将一些旧的纯文本文档转换为 reST/Sphinx 格式,目的是从那里输出到几种格式(包括 HTML 和文本).一些文档中的函数用于处理位串,其中一个常见的情况是这样的句子:起始字符是值为 0 的空白".
I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.
我尝试通过以下方式将其作为内联文字编写:起始字符是空白 `` ``,其值为 0.
或 起始字符是空白 :literal:`` 其值为 0.
但这些最终如何工作存在一些问题:
I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0.
or Starting character is the blank :literal:` ` which has the value 0.
but there are a few problems with how these end up working:
- reST 语法对象直接位于文字内部的空白处,并且不会被识别.
- 以上可以修复"--它在 HTML (
) 和纯文本 (
" "
) 输出中看起来是正确的--在文字内有一个不间断的空格字符,但从技术上讲,这在我们的例子中是一个谎言,如果用户复制了这个字符,他们就不会复制他们期望的内容. - 空格可以用正则引号括起来,这样可以正确识别文字,虽然 HTML 中的输出可能没问题 (
" "
),但在纯文本中它以双引号结束如"" ""
. - 在上面的两个 2/3 中,如果文字落在换行边界上,纯文本编写器(使用
textwrap
)将很乐意在文字内部换行并修剪空格,因为它位于开头/行尾.
- reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
- The above can be "fixed"--it looks correct in the HTML (
) and plaintext (
" "
) output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect. - The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (
" "
), in plaintext it ends up double-quoted as"" ""
. - In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses
textwrap
) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.
我觉得我错过了什么;有什么好的方法可以处理吗?
I feel like I'm missing something; is there a good way to handle this?
推荐答案
我希望能摆脱这种情况,而无需自定义代码来处理它,但是,唉,我还没有找到这样做的方法.如果有人有更好的主意,我会再等几天再接受这个答案.下面的代码并不完整,我也不确定它是否完成"(将在我们的审查过程中准确地整理出它应该是什么样子)但基础知识完好无损.
I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.
该方法有两个主要组成部分:
There are two main components to the approach:
- 引入一个
char
角色,它需要一个字符的 unicode 名称作为其参数,并在将字符本身包装在一个内联文字节点中的同时生成该字符的内联描述. - 修改 Sphinx 使用的文本包装器,使其不会在空格处中断.
- introduce a
char
role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node. - modify the text-wrapper Sphinx uses so that it won't break at the space.
代码如下:
class TextWrapperDeux(TextWrapper):
_wordsep_re = re.compile(
r'((?<!`)\s+(?!`)|' # whitespace not between backticks
r'(?<=\s)(?::[a-z-]+:)`\S+|' # interpreted text start
r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
@property
def wordsep_re(self):
return self._wordsep_re
def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
"""Describe a character given by unicode name.
e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
"""
try:
character = nodes.unicodedata.lookup(text)
except KeyError:
msg = inliner.reporter.error(
':char: argument %s must be valid unicode name at line %d' % (text, lineno))
prb = inliner.problematic(rawtext, rawtext, msg)
return [prb], [msg]
app = inliner.document.settings.env.app
describe_char = "(U+%05X %s)" % (ord(character), text)
char = nodes.inline("char:", "char:", nodes.literal(character, character))
char += nodes.inline(describe_char, describe_char)
return [char], []
def setup(app):
app.add_role('char', char_role)
上面的代码缺少一些胶水来实际强制使用新的 TextWrapper、导入等.当完整版本确定后,我可能会尝试找到一种有意义的方式重新发布它;如果是这样,我会在这里链接.
The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.
标记:起始字符是:char:`SPACE`,值为0.
它会产生这样的纯文本输出:起始字符是char:``(U+00020 SPACE),其值为0.
It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.
HTML 输出如下: 起始字符是 <span>char:<code class="docutils literal"></code><span>(U+00020 SPACE)</span></span>其值为 0.
And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.
HTML 输出最终看起来大致如下:起始字符是 char:(U+00020 SPACE),其值为 0.
The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.
这篇关于如何在 reST/Sphinx 中记录字符串中的单个空格字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!