从包含TinyMCE(html)内容的JSON对象生成PDF [英] Generate PDF from JSON object containing content from TinyMCE (html)

查看:182
本文介绍了从包含TinyMCE(html)内容的JSON对象生成PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR

如何从包含以HTML编写的字符串的JSON对象创建PDF.

How do you create a PDF from a JSON object that contains a String written in HTML.

示例JSON:

{
  dimensions: {
    height: 297,
    width: 210
  },
  boxes: [
    {
      dimensions: {
        height: 10,
        width: 190
      },
      position: {
        x: 10,
        y: 10
      },
      content: "<h1>Hello StackOverflow</h1>, I think you are <strong></strong>! I hope someone can answer this!"
    }
  ]
}


前端使用的技术: AngularJS 1.4.9 ment.io


Tech used in front-end: AngularJS 1.4.9, ui.tinymce, ment.io

后端:任何有效的方法.

Back-end: whatever works.

我希望能够为PDF创建模板.用户在文本区域中写一些文本,使用一些变量,稍后将用实际数据替换该变量,并且当用户按下按钮时,应随成品一起返回PDF. 这应该是非常通用的.因此它几乎可以用于任何事物.

I want to be able to create templates for PDFs. The user writes some text in a textarea, uses some variable that will later be replaced with actual data, and when the user presses a button, a PDF should be returned with the finished product. This should be very generic. So it would be able to be used in pretty much anything.

因此,一个最小的示例:用户在TinyMCE中写一些文本,例如

So, minimal example: The user writes a little text in TinyMCE like

<h1>Hello #[COMMUNITY]</h1>, I think you are <strong>great</strong>! I hope someone can answer this!

此文本包含用户在ment.io插件帮助下获得的两个变量.实际变量由控制器提供. 这段文字是用TinyMCE的AngularJS版本编写的,上面还带有Ment.io,可以很好地查看可用变量.

This text contains two variables that the user gets with the help of the ment.io plugin. The actual variables is supplied from the controller. This text is written in an AngularJS version of TinyMCE which also has Ment.io on it which supplies a nice view of available variables.

当用户按下Save按钮时,将创建一个类似于以下内容的JSON对象,即模板.

When the user presses the Save button, a JSON object like the following is created, which is the template.

{
  dimensions: {
    height: 297,
    width: 210
  },
  boxes: [
    {
      dimensions: {
        height: 10,
        width: 190
      },
      position: {
        x: 10,
        y: 10
      },
      content: "user input"
    }
  ]
}

我在Angular中有一条指令,它实际上可以生成任意数量的任何大小的框(generic-ho!).这部分效果很好.只需在第一个dimensions对象中发送想要的页面"大小(以mm为单位,因此示例中为A4纸张尺寸)即可,就像在对象中看到的那样.然后,在框中定义它们应该多大,以及它在纸"上的哪个位置.最后是用户在TinyMCE文本区域中编写的内容.

I have a directive in Angular that can generate any number of boxes really, in any size (generic-ho!). This part works great. Simply send in how big you want the 'page' (in mm, so the example says A4-paper size) in the first dimensions object as you see in the object. Then in the boxes you define how big they should be, and where on the 'paper' it should go. And then finally the content, which the user writes in a TinyMCE textarea.

下一步:后端将变量替换为实际数据.然后将其传递给生成器.

Next step: The back-end replaces the variables with actual data. Then pass it on to the generator.

然后我们进入棘手的部分:实际的生成器.最好接受JSON.这样做的原因是因为任何项目都应该能够使用它.前端与PDF生成器并驾齐驱.他们不在乎中间是什么.这意味着生成器几乎可以用任何东西编写.我是Java开发人员,因此Java更可取(因此带有Java标签).

Then we come to the tricky part: The actual generator. This should accept, preferably, JSON. The reason for this is because any project should be able to use it. The front-end and the PDF-generator goes hand in hand. They don't care what's in the middle. This means that the generator can be written in pretty much anything. I'm a Java-developer though, so Java is preferable (hence the Java-tag).

我发现的解决方案是:

PDFbox ,但是使用该问题的原因是TinyMCE生成的内容. TinyMCE输出HTML或XML. PDFBox根本不处理此问题.这意味着我必须编写自己的HTML或XML解析器,以试图弄清楚用户想要粗体显示的位置,以及她想要斜体,标题,其他字体等的位置.等等.真的不想那样我以前就为此而烦恼.另一方面,它非常适合将文本放置在正确的位置.即使是原始文字.

PDFbox, but the problem with using that is the content that TinyMCE produces. TinyMCE outputs HTML or XML. PDFBox does not handle this, at all. Which means I have to write my own HTML or XML parser to try and figure out where the user wants bold-text, and where she wants italics, headings, other font, etc. etc. And I really don't want that. I've been burned on that before. It is on the other hand great for placing the text in the correct places. Even if it is the raw text.

我已经阅读到 iText 具有HTML功能.但是AGPL许可证几乎可以杀死它.

I've read that iText does HTML. But the the AGPL-license pretty much kills it.

我还查看了飞碟,它采用XHTML并创建了PDF.但这似乎依赖于iText.

I've also looked at Flying Saucer that takes XHTML and creates a PDF. But it seems to rely on iText.

我现在正在寻找的解决方案是使用 Apache FOP 的复杂方法. FOP使用XSL-FO对象进行处理.因此,这里的问题是实际上动态创建该XSL-FO对象.我还读到XSL-FO标准已被删除,因此不确定这种方法是否适用于未来.我从未使用过FOP和XSLT.因此任务似乎艰巨. 我目前正在查看的是TinyMCE的输出,通过诸如 JTidy 之类的东西来运行以获取XHTML .从XHTML创建XSLT文件(以某种神奇的方式).从XHTML和XSLT创建XSL-FO对象.然后从XSL-FO文件生成PDF. 告诉我,有一种更简单的方法.

The solution I'm looking at now is a convoluted way to use Apache FOP. FOP takes an XSL-FO object to work on. So the trouble here is to actually dynamically create that XSL-FO object. I've also read that the XSL-FO standard has been dropped, so unsure how future-proof this approach will be. I've never worked with neither FOP nor XSLT. So the task seems daunting. What I'm currently looking at is taking in the output from TinyMCE, run that through something like JTidy to get XHTML. From the XHTML create a XSLT file (in some magical way). Create a XSL-FO object from the XHTML and XSLT. And the generate the PDF from the XSL-FO file. Please tell me there is an easier way.

我不可能是第一个想要做这样的事情的人.然而,寻找答案似乎无法产生实际结果.

I can't have been the first to want to do something like this. Yet searching for answers seems to yield very few actual results.

所以我的问题基本上是这样的:如何从上述的JSON对象(包含HTML)创建PDF,并使其结果看起来像在TinyMCE中编写时一样? 请记住,该对象可以包含无限数量的盒子.

So my question is basically this: How do you create a PDF from a JSON-object like the above, which contains HTML, and get the resulting text to look like it does when you write it in TinyMCE? Have in mind that the object can contain an unlimited number of boxes.

推荐答案

所以.经过一些研究和工作,我决定实际上将PDFbox用于这一代.对于内容输入,我也非常严格.现在,我真的只接受粗体,斜体和标题.所以我要寻找<strong><em><h[1-6]>标签.

So. After some research and work I decided to actually go with PDFbox for the generation. I've also been very strict about what I accept as content input. Right now, I really just accept bold, italics and headings. So I look for <strong>, <em>, and <h[1-6]> tags.

首先,我对输入的JSON进行了一些更新,确实进行了更多包装.

To begin with, I updated my input JSON a bit, more wrapping really.

{
   [
      documents: [
        {
          pages: [
            {
              dimensions: {width: 210, height, 297},
              boxes: [
                dimensions: {width: 190, height: 40},
                placement: {x: 10, y, 10},
                content: "Hello <strong>StackOverflow</strong>!"
              ]
            }
          ]
        }
      ]
   ]
}

原因是因为我希望能够在同一PDF中放入很多文档.考虑一下您是否正在大量发送信件.每个文档都略有不同,但是您仍然希望它们都在同一PDF中.您当然可以只用页面级别来完成所有操作,但是我认为如果一个文档是几页,最好将它们分开.

And the reason is because I want to be able to put out lots and lots of documents in the same PDF. Think if you are doing a mass send out of letters. Each document is slightly different, but you still want it all in the same PDF. You could of course do this all with just the pages level, but if one document is several pages, it's nicer to have the separated, I think.

我的实际代码长约500行,因此,我不会将所有内容都粘贴到这里,只是需要帮助的基本部分,而仍然大约150行. 去吧:

My actual code is about 500 lines long, so I won't paste it all here, just the basic parts to be of help, and that' still around 150 lines. Here goes:

public class Generator {
   public static ByteArrayOutputStream generatePDF(final Bundle bundle) {
      final ByteArrayOutputStream output = new ByteArrayOutputStream();

      pdf = new PDDocument();
      for (final Document document : bundle.documents) {
         for (final Page page : document.pages) {
            pdf.addPage(generatePage(pdf, page));
         }
      }
      pdf.save(output);
      pdf.close();

      return output;
   }

   private static generatePage(final PDDocument document, final Page page) {
      final PDRectangle rect = new PDRectangle(mmToPoints(page.dimensions.width)mmToPoints(page.deminsions.height));
      final PDPage pdPage = new PDPage(rect);
      final PDPageContentStream cs = new PDPageContentStream(document, pdPage);

      for (final Box box : page.boxes) {
         resetFont(cs); // Reset the font when starting new box so missing ending tags don't mess up the next box.

         final String pc = processContent(box.content); // Make the content prettier. Eg. strip all <p>, replace </p> with \n, strip all <div> tags, etc.

         lines(Arrays.asList(processContent.split("\n")), box, cs);
      }
      cs.close();
      return pdPage;
   }

   private static float mmToPoints(final float mm) {
      // 1 inch == 72 points (standard DPI), 1 inch == 25.4mm. So, mm to points means (mm / inchInmm) * pointsInInch
      return (float) ((mm / 25.5) * 72);
   }

   private static lines(final List<String> lines, final Box box, final PDPageContentStream cs) {
      if (lines.size() == 0) { return; }
      cs.beginText();
      cs.moveTextPositionByAmount(mmToPoints(box.placement.x), mmToPoints(box.placement.y));
      // Now we begin the tricky part
      for (int i = 0, length = lines.size; i < length; ++i) {
         final String line = lines.get(i);
         final List<Word> wordList = new ArrayList<>();
         final String[] splitArray = line.split(" ");
         final float fontHeight = fontHeight(currentFont(), currentFontSize()); // Documented elsewhere
         cs.appendRawCommands(fontHeight + " TL\n");
         if (i == 0) { addNewLine(cs); } // PDFbox starts at the bottom, we start at the top. Add new line so we are inside the box
         for (final String index : splitArray) {
            final String word = index + " "; // We removed spaces when we split on them, add it to words now.
            final StringBuilder wordBuilder = new StringBuilder();
            boolean addWord = true;
            for (int j = 0; wordLength = word.length(); j < wordLength ;                ++j){
               final char c = word.charAt(j);
               if (c == '<') { // check for <strong> and those
                  final StringBuilder command = new StringBuilder();
                  if (addWord && wordBuilder.length() > 0) {
                     wordList.add(new Word(wordBuilder.toString(), currentFont(), currentFontSize()));
                     wordBuilder.setLength(0);
                     addWord = false;
                  }
                  for (; j < wordLength; ++j) {
                     final char c1 = word.charAt(j);
                     command.append(c1);
                     if (c1 == '>') {
                        if (j + 1 < wordLength) { addWord = true; }
                        break;
                     }
                  }
                  final boolean b = parseForFontChange(command.toString());
                  if (!b) { // If it wasn't a command, we want to append it to out text
                     wordBuilder.append(command.toString());
                  }
               } else if (c == '&') { // check for html escaped entities
                  final int longestHTMLEntityName = 24 + 2; // &ClocwiseContourIntegral;
                  final StringBuilder escapedChar = new StringBuilder();
                  escapedChar.append(c);
                  int k = 1;
                  for (; k < longestHTMLEntityName && j + k < wordLength; ++k) {
                     final char c1 = word.charAt(j + k);
                     if (c1 == '<' || c1 == '>') { break; } // Can't be an espaced char.
                     escapedChar.append(c1);
                     if (c1 == ';') { break; } // End of char
                  }
                  if (escapedChar.indexOf(";") < 0) { k--; }
                  wordBuilder.append(StringEspaceUtils.unescapedHtml4(escapedChar.toString()));
                  j += k;
               } else {
                  wordBuilder.append(c);
               }
            }
            if (addWord) {
               wordList.append(new Word(wordBuilder.toString(), currentFont(), currentFontSize()));
            }
         }
         writeWords(wordList, box, cs);
         if (i < length - 1) { addNewLine(cs); }
      }
      cs.endText();
   }

   public static void writeWords(final List<Word> words, final Box box, final PDPageContentStream cs) {
      final float boxWidth = mmToPoints(box.dimensions.width);
      float lineWidth = 0;
      for (final Word word : words) {
         lineWidth += word.width;
         if (lineWidth > boxWidth) {
            addNewLine(cs);
            lineWidth = word.width;
         }
         if (lineWidth > boxWidth) { // Word longer than box width
            lineWidth = 0;
            final String string = word.string;
            for (int i = 0, length = string.length(); i < length; ++i) {
               final char c = string.charAt(i);
               final float charWidth = calculateStringWidth(String.valueOf(c), word.font, word.fontSize);
               lineWidth += charWidth;
               if (lineWidth > boxWidth) {
                  addNewLine(cs);
                  lineWidth = charwidth);
               }
               drawChar(c, word.font, word.fontSize, cs);
            }
         } else {
            draWord(word, cs);
         }
      }
   }
}

public class Word {
   public final String string;
   public final PDFont font;
   public final float fontSize;
   public final float width;
   public final float height;

   public Word(final String string, final PDFont font, final float fontSize) {
      this.string = string;
      this.font = font;
      this.fontSize = fontSize;
      this.width = calculateStringWidth(string, font, fontSize);
      this.height = calculateStringHeight(string, font, fontSize);
   }
}

我希望这可以帮助其他面临相同问题的人.拥有Word类的原因是,如果您想拆分单词而不是字符. 许多其他文章描述了如何使用这些辅助方法,例如calculateStringWidth等.因此不在这里.

I hope this helps someone else facing the same problem. The reason to have a Word class is if you want to split on words, rather than chars. Lots of other posts describe how to use some of these helper methods, like calculateStringWidth etc. So They are not here.

检查如何使用PDFBox drawString插入换行表示换行符和fontHeight.

Check How to Insert a Linefeed with PDFBox drawString for newlines and fontHeight.

如何使用Apache在PDF中生成多行pdfbox 表示字符串宽度.

在我的情况下,parseForFontChange方法更改当前字体和字体大小.活动的当然是由方法currentFont()currentFontSize返回的.我使用像(?ui:(<strong>))这样的正则表达式来检查其中是否有一个粗体标记.使用适合自己的东西.

In my case the parseForFontChange method changes the current font and font size. What's active is of course returned by the method currentFont() and currentFontSize. I use regexes like (?ui:(<strong>)) to check if a bold-tag was in there. Use what suits you.

这篇关于从包含TinyMCE(html)内容的JSON对象生成PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆