可以使用Apache FOP将任意HTML转换为PDF吗? [英] Can Apache FOP be used to convert an arbitary HTML to PDF?

查看:402
本文介绍了可以使用Apache FOP将任意HTML转换为PDF吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用Apache FOP将HTML转换为PDF. (HTML-> XHTML-> XSL-FO-> PDF).我使用Antenna House的xhtml2fo.xsl进行xhtml-> XSL-FO转换.

它适用于简单的html文件.

对于具有样式(通过嵌入式CSS或按样式属性)的html文件,它不起作用. PDF已创建,但完全未格式化.我正在尝试将HTML文件转换为对样式/内容没有太多控制权的位置.

在我的用例中,为每个html创建一个xslt是不切实际的.

目前,我的flyingsaucer确实可以正常工作.但是,该要求要求未经AGPL许可的实施.

我的问题是:这可以通过FOP来实现吗?

感谢任何帮助

解决方案

tl; dr版本:

在最普遍的情况下,,您不能使用FOP来转换任何保留原始样式的html(更改格式器将无法解决问题).

但是,您可以使用FOP(或任何其他格式化程序)来尝试并合理地处理大量html文档;这可能需要一些XSLT调整.


为什么通常无法正常工作

HTML-> XHTML-> XSL-FO-> PDF

您已对必要的转换链进行了描述.

但是, FOP仅涉及最后一步:除了尚未实现的功能外,最终的PDF文件应遵守FO文件中表示的印刷特征.

我将Antenna House的xhtml2fo.xsl用于xhtml-> XSL-FO转换 [...]

已创建PDF,但未完全格式化

您正在使用的样式表是 AntennaHouse网站上的该样式表?

快速浏览,看来应该转换style="..."属性在FO输出中生成单独的属性,但是它不处理外部CSS文件.

结果,使用外部CSS设置样式的HTML文件将被转换为FO文件,而没有任何格式设置属性(font-familyfont-sizetext-align,...).

这可以用FOP来实现吗?

如果确实如此,那么格式化程序除了使用默认值之外什么也不能做,其中一些(想到的是font-family)取决于应用程序.

因此,根据您使用的格式化程序,您会有一个 略有不同的结果,但仍然是未格式化"的结果.一个.

您需要的是合并" html和css文件,内联样式,以便XSLT可以处理它们,或者使用能够考虑外部css文件的不同样式表(但我怀疑编写一个在一般情况下工作的文件并不容易)./p>

可以轻松解决的问题

在处理html表时,链接的XSLT使用fo:table-and-caption元素,FOP不支持该元素,因此表消失".从输出中.

可以通过在XSLT中进行少量更改来解决此问题,或者使用导入另一个样式表的自定义样式表来解决(可能是更清洁的解决方案):

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format"
  xmlns:html="http://www.w3.org/1999/xhtml">

  <xsl:include href="xhtml2fo.xsl"/>

  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>

  <xsl:template match="html:table" priority="2">
    <fo:table xsl:use-attribute-sets="table">
      <!-- warning: table caption is not processed! -->
      <xsl:call-template name="process-table"/>
    </fo:table>
  </xsl:template>

</xsl:stylesheet>

您实际使用的样式表可能需要进行一些类似的调整才能更好地与FOP结合使用.

披露:我是一名FOP开发人员,尽管如今并不十分活跃.

I have tried to use Apache FOP to convert HTML to PDF. ( HTML -->XHTML--> XSL-FO --> PDF). I used the xhtml2fo.xsl from Antenna House for the xhtml --> XSL-FO conversion.

It works for simple html files.

It does not work for html files with styling ( via embedded css or by style attribute). A PDF is created but completely unformatted. I am trying to convert HTML file where I do not have much control over the styling/content.

Creating an xslt for each html is not practical in my use-case.

Currently, I do have a working implementation with flyingsaucer. However, the requirement calls for an implementation without AGPL license.

My Question is: Can this be achieved with FOP ?

Appreciate any help

解决方案

tl;dr version:

In the most general situation, no, you cannot use FOP to convert any html preserving the original styles (and changing formatter would not solve the problem).

However, you can use FOP (or any another formatter) to try and handle reasonably well a large subset of html documents; this could require some XSLT adjustment.


Why it cannot work in general

HTML --> XHTML --> XSL-FO --> PDF

Your description of the necessary transformation chain is spot on.

However, FOP is only involved in the last step: with the exception of the features that are not implemented yet, the final PDF file should respect the typographical characteristics expressed in the FO file.

I used the xhtml2fo.xsl from Antenna House for the xhtml --> XSL-FO conversion [...]

A PDF is created but completely unformatted

Is the stylesheet you are using this one from the AntennaHouse site?

From a quick look, it seems like it should convert the style="..." attribute producing separate attributes in the FO output, but it does not process external CSS files.

As a result, the HTML files styled with external CSS will be transformed into FO files without any formatting attribute (font-family, font-size, text-align, ...).

Can this be achieved with FOP ?

If that's indeed the case, the formatter cannot do anything but use the default values, a few of which (font-family comes to mind) are application-dependant.

So, according to the formatter you use you will have a slightly different result, but still an "unformatted" one.

What you need is either a tool to "merge" the html and css files, inlining the styles so that the XSLT can process them, or a different stylesheet capable of taking into account the external css files (but I suspect it would not be easy to write one working in a general case).

What can be fixed with little effort

While processing html tables the linked XSLT uses the fo:table-and-caption element, which is not supported by FOP so the tables "disappear" from the output.

This can be fixed with a small change in the XSLT, or (probably a cleaner solution) using a custom stylesheet importing the other one:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format"
  xmlns:html="http://www.w3.org/1999/xhtml">

  <xsl:include href="xhtml2fo.xsl"/>

  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="no"/>

  <xsl:template match="html:table" priority="2">
    <fo:table xsl:use-attribute-sets="table">
      <!-- warning: table caption is not processed! -->
      <xsl:call-template name="process-table"/>
    </fo:table>
  </xsl:template>

</xsl:stylesheet>

It is possible that the stylesheet you are actually using needs a few similar adjustments to better work in conjunction with FOP.

Disclosure: I'm a FOP developer, though not very active nowadays.

这篇关于可以使用Apache FOP将任意HTML转换为PDF吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆