用于解析内部链接的pandoc命令行参数 [英] pandoc command line parameters for resolving internal links

查看:281
本文介绍了用于解析内部链接的pandoc命令行参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题类似于这篇文章,但相同。在使用几个相互链接的HTML文件作为输入时,我无法弄清楚正确的 pandoc命令行参数来维护/解析跨文档链接。

My problem is similar to this post, but not identical. I somehow can't figure out the correct pandoc command line parameters for maintaining/resolving cross-document links when using a couple of interlinked HTML files as the input.

假设我有两个文件,chapter1.xhtml和chapter2.xhtml位于/ home / user / Documents文件夹中,其中包含以下内容:

Let's say I have two files, chapter1.xhtml and chapter2.xhtml located in the /home/user/Documents folder with the following contents:

<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
<h3>Chapter 1</h3>
<p><a href="/home/user/Documents/chapter2.xhtml">Next chapter</a><br /></p>

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
</body>
</html>

其中包含指向下一个文档的链接。

which contains a link to the next document.

<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
<h3>Chapter 2</h3>
<p><a href="/home/user/Documents/chapter1.xhtml">Previous chapter</a><br /></p>

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
</body>
</html>

其中包含指向上一个文档的链接。

which contains a link to the previous document.

我使用了以下命令行参数:

I used the following command line parameters:

$ pandoc -s --toc --verbose -o /home/user/Documents/output.markdown /home/user/Documents/chapter1.xhtml /home/user/Documents/chapter2.xhtml

我得到以下输出:

---
---

-   [Chapter 1](#chapter-1)
-   [Chapter 2](#chapter-2)

### Chapter 1

[Next chapter](/home/user/Documents/chapter2.xhtml)\

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

### Chapter 2

[Previous chapter](/home/user/Documents/chapter1.xhtml)\

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

当我选择docx或latex / pdf作为输出格式时,也会出现此问题。我也尝试使用相对链接,但没有任何效果。

This problem also occurs when I select docx or latex/pdf as the output format. I also tried to use relative links, but nothing worked.

解决跨文档链接的正确参数是什么?

What are the correct parameters for resolving cross-document links?

tl; dr I.e.我不想要包含原始路径的链接引用;我希望他们指向新的输出文档。

tl;dr I.e. I don't want link references that contain the original paths; I want them to point to the new output document.

推荐答案

问题是您的链接包含绝对路径( / home / user / Documents / chapter1。 xhtml )而不是相对的( chapter1.xhtml )。我无法想象包含绝对路径的ePUB文件,如果是这样,文件中的链接将只能在您的计算机上正常工作。因此解决方案必须在将这些ePUB文件提供给pandoc之前修复它们。

The problem is that your links contain absolute paths (/home/user/Documents/chapter1.xhtml) instead of relative ones (chapter1.xhtml). I cannot imagine the ePUB file containing absolute paths, and if it does, the links in the file will only ever work correctly on your computer. So the solution will have to be fixing those ePUB files before feeding them to pandoc.

注意从pandoc到markdown到epub的往返再到html按预期工作:

Note that roundtripping from pandoc from markdown to epub and back to html works as expected:

$ pandoc -o foo.epub
# foo

adfs

# bar

go [to foo](#foo)


$ unzip foo.epub

$ cat ch002.xhtml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <title>bar</title>
  <link rel="stylesheet" type="text/css" href="stylesheet.css" />
</head>
<body>
<div id="bar" class="section level1">
<h1>bar</h1>
<p>go <a href="ch001.xhtml#foo">to foo</a></p>
</div>
</body>
</html>

$ pandoc foo.epub

<p><span id="ch001.xhtml"></span></p>
<div id="ch001.xhtml#foo" class="section level1">
<h1>foo</h1>
<p>adfs</p>
</div>
<p><span id="ch002.xhtml"></span></p>
<div id="ch002.xhtml#bar" class="section level1">
<h1>bar</h1>
<p>go <a href="#ch001.xhtml#foo">to foo</a></p>
</div>

PS

使用两个输入文件,如:

Using two input documents like:

pandoc -o output.md chapter1.xhtml chapter2.xhtml

作为pandoc README状态:

works as the pandoc README states:


如果多个输入文件是给定,pandoc将它们全部连接起来(在它们之间用空行)解析之前

所以对于解析由pandoc完成,它将其视为一个文档......所以难怪跨文件链接不起作用。

So for the parsing done by pandoc, it sees it as one document... so no wonder that cross-file links won't work.

这篇关于用于解析内部链接的pandoc命令行参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆