Rmarkdown中的HTML标记为Word文档 [英] HTML tags in Rmarkdown to word document

查看:103
本文介绍了Rmarkdown中的HTML标记为Word文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有可能在渲染为word的Rmarkdown文档中使用HTML标记?

Is there any possibility to use HTML tags in Rmarkdown documents rendered to word?

例如:

---
output: word_document
---

# This is rendered as heading

<h1> But this is not </h1>

当渲染为html_document时完美工作,但当渲染为word_document时则完美.

Works perfectly when rendering as html_document, but not when rendering as a word_document.

在此处提出了有关标签的更具体问题,但没有解决方案:在下划线RMarkdown转换为Microsoft Word

A more specific question about tags has been asked here, but without solution: Underline in RMarkdown to Microsoft Word

推荐答案

好的,我们开始:

---
output:
  word_document:
    md_extensions: +raw_html-markdown_in_html_blocks
    pandoc_args: ['--lua-filter', 'read_html.lua']
---

# This is rendered as heading

<h1> And this is one, too </h1>

其中read_html.lua必须是具有以下内容的同一目录中的文件:

where read_html.lua must be a file in the same directory with this content:

function RawBlock (raw)
  if raw.format:match 'html' and not FORMAT:match 'html' then
    return pandoc.read(raw.text, raw.format).blocks
  end
end

让我们打开上面的包装,看看它是如何工作的.您会注意到的第一件事是word_document的附加参数. md_extensions修改pandoc解析文本的方式,有关完整说明,请参见此处在您的终端中列出(或运行pandoc --list-extensions=markdown).我们启用raw_html以确保pandoc不会丢弃原始HTML标记,并禁用markdown_in_html_blocks以确保我们以pandoc的内部格式将整个HTML标记作为一个块来获得.

Let's unpack the above to see how it works. The first thing you'll notice are the additional parameters to word_document. The md_extensions modify the way that pandoc parses the text, see here for a full list (or run pandoc --list-extensions=markdown) in your terminal. We enable raw_html to make sure that pandoc does not discard raw HTML tags, and disable markdown_in_html_blocks as to ensure that we get the whole HTML tag as one block in pandoc's internal format.

下一个设置是pandoc_args,我们告诉pandoc使用 Lua过滤器在转换过程中修改文档.过滤器挑选出所有HTML块,将它们解析为HTML而不是Markdown,然后将原始HTML替换为解析结果.

The next setting is pandoc_args, where we tell pandoc to use a Lua filter to modify the document during conversion. The filter picks out all HTML blocks, parses them as HTML instead of Markdown, and replaces the raw HTML with the parsing result.

因此,如果您使用的是Pandoc可以读取的原始HTML,那就可以了.如果您正在使用Pandoc无法读取的特殊说明,那么上述设置也将无济于事.您必须用OOXML(docx中使用的XML格式)重写标记.

So if you are using raw HTML that pandoc can read, you'll be fine. If you are using special instructions which pandoc cannot read, then the setup described above won't help either. You'd have to rewrite the markup in OOXML, the XML format used in docx.

这篇关于Rmarkdown中的HTML标记为Word文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆