从 HTML 转换为 Markdown 时阻止 Pandoc 转义单引号 [英] Stopping Pandoc from escaping single quotes when converting from HTML to Markdown

查看:75
本文介绍了从 HTML 转换为 Markdown 时阻止 Pandoc 转义单引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我将单引号 ' 从 HTML 转换为 Markdown,它会自动转义:

If I convert a single quote ' from HTML to Markdown, it is automatically escaped:

 % echo "'" | pandoc -f html -t markdown
 \'

我希望它输出时不带斜杠,因为这样会使带有收缩的文本更难阅读.

I'd like it to output without the slash, as it makes text with contractions rather much harder to read.

我认为这可能是由于all_symbols_escapable"选项造成的,但它仍然发生,即使我将其关闭:

I thought this might be due to the "all_symbols_escapable" option, but it still happens, even when I turn that off:

% echo "'" | pandoc -f html -t markdown-all_symbols_escapable
\'

不是问题,但是,对于 markdown_strict:

It isn't a problem, however, for markdown_strict:

% echo "'" | pandoc -f html -t markdown_strict
'

有什么建议吗?我想使用调整了选项的默认 Pandoc 标记,或者如果这不是其他人所期望的,则将其报告为错误.

Any suggestions? I'd like to use the default Pandoc markdow with the options tweaked, or report this as a bug if it's not what others expect.

推荐答案

Escaping 与 pandoc 的 smart 扩展有关.此扩展在适当的时候将单引号转换为印刷正确的开始/结束单引号或撇号.当查看仅使用 ASCII 字符的 HTML 输出时,这一点变得最清楚:

Escaping is related to pandoc's smart extensions. This extension converts single quotes to the typographically correct opening/closing single quote or apostrophe when appropriate. This becomes most clear when looking at HTML output that uses only ASCII characters:

% echo "'hello'" | pandoc -f markdown -t html --ascii
<p>&lsquo;hello&rsquo;</p>

% echo "let's" | pandoc -f markdown -t html --ascii
<p>let&rsquo;s</p>

可以通过转义字符来禁用这种对引号的智能处理

This smart treatment of quotes can be disabled on a per-case basis by escaping the character

% echo "let\'s" | pandoc -f markdown -t html --ascii
<p>let's</p>

或禁用降价的智能扩展:

or by disabling the smart extension for markdown:

% echo "let's" | pandoc -f markdown-smart -t html --ascii
<p>let's</p>

因此,每当 pandoc 在 HTML 中看到 ' 字符时,它都会假定该字符是有意通过更正确的单引号选择的,从而确保不会以智能"方式处理它从 Markdown 回读时的方式.

So whenever pandoc sees a ' character in HTML, it assumes that this character was chosen intentionally over the more correct single quote, and thus ensures that it won't be treated in a "smart" way when read back from Markdown.

因此,解决方案是告诉 pandoc 它应该忽略这些细节,并将 Markdown 写成好像它不会受到引号的智能处理:

The solution is thus to tell pandoc that it should ignore these details and will write Markdown as if it would not be subjected to the smart treatment of quotes:

% echo "'" | pandoc -f html -t markdown-smart
'

在使用 markdown_strict 时,smart 扩展已经被禁用,这就是您在这种情况下获得所需行为的原因.

The smart extension is already disabled when using markdown_strict, which is why you got the desired behavior in that case.

这篇关于从 HTML 转换为 Markdown 时阻止 Pandoc 转义单引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆