使用iText/iTextSharp从PDF文件中提取字体高度和旋转度 [英] Extract font height and rotation from PDF files with iText/iTextSharp

查看:857
本文介绍了使用iText/iTextSharp从PDF文件中提取字体高度和旋转度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一些代码,以使用iTextSharp从PDF文件提取文本和字体高度,但是不处理文本旋转.如何提取/计算这些信息?

I created some code to extract text and font height from a PDF file using iTextSharp, but does not handle text rotation. How can that information be extracted/computed?

这是代码:

// Create PDF reader
var reader = new PdfReader("myfile.pdf");

for (var k = 1; k <= reader.NumberOfPages; ++k)
{
    // Get page resources
    var page = reader.GetPageN(k);
    var pdfResources = page.GetAsDict(PdfName.RESOURCES);

    // Create custom render listener, processor, and process page!
    var listener = new FunnyRenderListener();
    var processor = new PdfContentStreamProcessor(listener);
    var bytes = ContentByteUtils.GetContentBytesForPage(reader, k);
    processor.ProcessContent(bytes, pdfResources);
}

[...]

public class FunnyRenderListener : IRenderListener
{
    [...]

    void RenderText(TextRenderInfo renderInfo)
    {
        // Get text
        var text = renderInfo.GetText();

        // Get (computed) font size
        var bottomLeftPoint = renderInfo.GetDescentLine().GetStartPoint();
        var topRightPoint = renderInfo.GetAscentLine().GetEndPoint();
        var rectangle = new Rectangle(
            bottomLeftPoint[Vector.I1], bottomLeftPoint[Vector.I2],
            topRightPoint[Vector.I1], topRightPoint[Vector.I2]
        );
        var fontSize = Convert.ToDouble(rectangle.Height);

        Console.WriteLine("Text: {0}, FontSize: {1}", text, fontSize);
    }
}

推荐答案

您需要的信息(即文本旋转)无法通过TextRenderInfo成员直接获得,但确实具有方法

The information you need, i.e. the text rotation, is not directly available via a TextRenderInfo member but it does have the method

/**
 * Gets the baseline for the text (i.e. the line that the text 'sits' on)
 * This value includes the Rise of the draw operation - see getRise() for the amount added by Rise
 */
public LineSegment GetBaseline()

文本旋转最有可能是指此行相对于水平行的旋转.做一些简单的数学运算,因此,您可以根据此LineSegment来计算旋转.

Most likely by text rotation you mean the rotation of this line against a horizontal one. Doing some easy math, therefore, you can calculate the rotation from this LineSegment.

PS:查看您的代码,您实际上已经在使用上升线和下降线.您也可以使用任何这些行代替基线.

PS: Looking at your code you actually already use the ascent line and descent line. You can use any of these lines as well instead of the base line.

这篇关于使用iText/iTextSharp从PDF文件中提取字体高度和旋转度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆