如何使用itextsharp从PDF读取表格？ [英] How to read table from PDF using itextsharp?

查看：2093 发布时间：2018/11/16 16:33:23 itextsharp

本文介绍了如何使用itextsharp从PDF读取表格？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在从pdf文件中读取表时遇到问题。这是一个非常简单的pdf文件，包含一些文本和表格。我使用的工具是itextsharp。我知道PDF中没有表格概念。经过一些谷歌搜索后，有人说可能使用itextsharp + custom ITextExtractionStrategy实现这一目标。但我不知道如何开始它。有人可以给我一些提示吗？或一小段示例代码？

I am having an problem with reading a table from pdf file. It's a very simple pdf file with some text and a table. The tool i am using is itextsharp. I know there is no table concept in PDF. After some googling, someone said it might be possible to achieve that using itextsharp + custom ITextExtractionStrategy. But I have no idea how to start it. Can someone please give me some hints? or a small piece of sample code?

干杯

推荐答案

这代码用于读取表格内容。所有值都包含在（）Tj中，所以我们查找所有值，你可以用字符串结果做任何事情。

This code is for reading a table content. all the values are enclosed by ()Tj, so we look for all the values, you can do anything then with the string resulst.

    string _filePath = @"~\MyPDF.pdf";
    public List<String> Read()
    {
        var pdfReader = new PdfReader(_filePath);
        var pages = new List<String>();

        for (int i = 0; i < pdfReader.NumberOfPages; i++)
        {
            string textFromPage = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, pdfReader.GetPageContent(i + 1)));

            pages.Add(GetDataConvertedData(textFromPage));
        }

        return pages;
    }

    string GetDataConvertedData(string textFromPage)
    {
        var texts = textFromPage.Split(new[] { "\n" }, StringSplitOptions.None)
                                .Where(text => text.Contains("Tj")).ToList();

        return texts.Aggregate(string.Empty, (current, t) => current + 
                   t.TrimStart('(')
                    .TrimEnd('j')
                    .TrimEnd('T')
                    .TrimEnd(')'));
    }

这篇关于如何使用itextsharp从PDF读取表格？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用itextsharp从PDF读取表格？ [英] How to read table from PDF using itextsharp?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用itextsharp从PDF读取表格？ [英] How to read table from PDF using itextsharp?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭