我可以使用Telerik Document Processing读取PDF内容吗? [英] Can I use Telerik Document Processing to read PDF content?

查看：82 发布时间：2021/4/1 22:08:16 c# .net pdf telerik

本文介绍了我可以使用Telerik Document Processing读取PDF内容吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在一个项目中使用Telerik的文档处理库，我希望可以使用它来读取PDF文件并搜索可以用于其他处理的特定文本.但是，尽管执行此操作的代码似乎很简单，但实际上并没有得到预期的结果.这是我一起提出的概念证明:

I am working on a project where Telerik's Document Processing libraries are a available to me, and I was hoping that I would be able to use it to read a PDF file and search for specific text that I can use for other processing. But while the code to do so seems straightforward, I am not actually getting expected results. This is the proof of concept I threw together:

        var fs = new FileStream("..\\some.pdf", FileMode.Open);

        RadFixedDocument doc = new PdfFormatProvider(fs).Import();

        var pageCt = 0;
        var elementCt = 0;
        foreach (var page in doc.Pages) {
            pageCt += 1;
            Console.WriteLine($"Page {pageCt}, (Has content: {page.HasContent}, {page.Content.Count})");
            foreach (var contentEl in page.Content) {
                elementCt += 1;
                Console.WriteLine($"Element {elementCt}");
                if (contentEl is TextFragment) {
                    string text = (contentEl as TextFragment).Text;
                    Console.WriteLine(text);
                    // if (text.Contains("{{CustomTag}}")) {
                    //     Console.WriteLine(text);
                    // } else {
                    //     Console.Write(".");
                    // }
                }
                else {
                    Console.WriteLine($"Content Type: {contentEl.GetType().ToString()}");
                }
            }
        }

我已经在许多文档上对此进行了测试，但是虽然看起来可以选择适当的页面数量，但是每个页面都报告 HasContent 为 false 和内容集合为空.

I have tested this on a number of documents, but while it seems to pick out the proper number of pages, each page reports HasContent is false and the Content collection is empty.

我认为我应该能够以这种方式逐步浏览PDF内容元素是不正确的吗?

Am I not correct in thinking I should be able to step through the PDF content elements this way?

我可以使用Telerik Document Processing读取PDF内容吗? [英] Can I use Telerik Document Processing to read PDF content?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

我可以使用Telerik Document Processing读取PDF内容吗? [英] Can I use Telerik Document Processing to read PDF content?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭