通过 :ref:? 从 ReST 文档中提取文本块 [英] Extracting blocks of text from ReST documents by :ref:?

查看:63
本文介绍了通过 :ref:? 从 ReST 文档中提取文本块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些 reStructuredText 文档.我想在在线帮助中使用它的片段.似乎一种方法是通过引用剪掉"标记片段,例如

I have some reStructuredText documentation. I would like to use snippets from it in online help. It seems like one approach would be to 'snip' out pieces of markup by reference, e.g.

.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help

如何使用 python/docutils/sphinx 提取 _my_interesting_section 标记的标记?

How could I use python/docutils/sphinx to extract the markup for the _my_interesting_section marker?

推荐答案

除了子类化和自定义 Docutils 解析器之外,我不知道您还能如何做到这一点.如果您只需要 reStructuredText 的相关部分并且不介意丢失一些标记,那么您可以尝试使用以下内容.或者,特定部分的处理标记(即 reStructuredText 转换为 HTML 或 LaTeX)很容易获得.请参阅我对这个问题的回答,以获取提取部分已处理 XML 的示例.让我知道这是否是您想要的.无论如何,这里是...

I'm not sure how you could do this other than subclassing and customising the Docutils parser. If you just need the relevant section of reStructuredText and don't mind losing some of the markup then you can try and use the following. Alternatively, the processed markup (i.e. reStructuredText converted to HTML or LaTeX) for a particular section is very easy to get. See my answer to this question for an example of extracting a part of the processed XML. Let me know if this is what you want. Anyway, here goes...

您可以使用 Docutils 非常轻松地操作 reStructuredText.首先,您可以使用 Docutils publish_doctree 函数发布 reStructuredText 的 Docutils 文档树 (doctree) 表示.这个文档树可以很容易地遍历并搜索特定的文档元素,即具有特定属性的部分.搜索特定部分引用的最简单方法是检查文档树本身的 ids 属性.doctree.ids 只是一个字典,其中包含对文档适当部分的所有引用的映射.

You can manipulate reStructuredText very easily using Docutils. First you could publish the Docutils document tree (doctree) representation of the reStructuredText using the Docutils publish_doctree function. This doctree can be traversed easily and searched for particular document elements, i.e. sections, with particular attributes. The easiest way to search for particular section reference is to inspect the ids attribute of the doctree itself. doctree.ids is simply a dictionary containing a mapping of all references to the appropriate part of the document.

from docutils.core import publish_doctree

s = """.. _my_boring_section:

Introductory prose
------------------

blah blah blah

.. _my_interesting_section:

About this dialog
-----------------

talk about stuff which is relevant in contextual help
"""

# Parse the above string to a Docutils document tree:
doctree = publish_doctree(s)

# Get element in the document with the reference id `my-interesting-section`:
ids = 'my-interesting-section'

try:
    section = doctree.ids[ids]
except KeyError:
    # Do some exception handling here...
    raise KeyError('No section with ids {0}'.format(ids))

# Can also make sure that the element we found was in fact a section:
import docutils.nodes
isinstance(section, docutils.nodes.section) # Should be True

# Finally, get section text
section.astext()

# This will print:
# u'About this dialog\n\ntalk about stuff which is relevant in contextual help'

现在标记已经丢失.如果注释太花哨,可以很容易地在上面结果的第一行下插入一些破折号以返回到您的部分标题.我不确定您需要为更复杂的内联标记做什么.希望以上内容对您来说是一个很好的起点.

Now the markup has been lost. If there is noting too fancy, it would be easy to insert some dashes under the first line of the result above to get back to your section heading. I'm not sure what you would need to do for more complicated inline markup. Hopefully the above is a good starting point for you though.

注意:查询 doctree.ids 时,我传递的 ids 属性与 reStructuredText 中的定义略有不同:前导下划线有已删除,所有其他下划线已替换为 -s.这就是 Docutils 规范化引用的方式.编写一个函数来将 reStructuredText 引用转换为 Docutils 的内部表示将非常简单.否则,我敢肯定,如果您仔细阅读 Docuitls,您可以找到执行此操作的例程.

Note: When querying doctree.ids the ids attribute I pass is slightly different to the definition in the reStructuredText: the leading underscore has been removed and all other underscores have been replaced by -s. This is how Docutils normalises references. It would be really straightforward to write a function to convert reStructuredText references to Docutils' internal representation. Otherwise, I'm sure if you dig through Docuitls you can find the routine that does this.

这篇关于通过 :ref:? 从 ReST 文档中提取文本块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆