iTextSharp 5.5.13.1从PDF提取文本时，没有数据可用于编码10000 [英] iTextSharp 5.5.13.1 no data is available for encoding 10000 when extracting text from PDF

查看：117 发布时间：2021/5/18 18:42:23 c# itext .net-core-3.1

本文介绍了iTextSharp 5.5.13.1从PDF提取文本时，没有数据可用于编码10000的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试从多页PDF文档中提取文本，并且几乎所有文档都可以很好地提取，但是有两个文档因10000错误编码而崩溃.无法使用的文档页面唯一的独特之处在于它们上有一个按钮并在其上形成了表单字段.

I'm trying to extract text from a multipage PDF document and almost all documents extract fine, but a couple of documents blow up with the encoding 10000 error. The only unique thing about the document pages that don't work is that they have a button and form fields on them.

            {
                var pageNumbersToSave = new List<int>();
                for (var i = 1; i <= r.NumberOfPages; i++)
                {
                    try
                    {
                        var s       = PdfTextExtractor.GetTextFromPage( r, i, new SimpleTextExtractionStrategy() );

我还尝试使用PDFStamper来使表单元素变平，但这并没有改变任何内容:

I also tried using a PDFStamper to flatten the form elements but that didn't change anything:

            byte[] flatBytes;
            using ( var r = new PdfReader( pdfBytes ) )
            {
                using (var ms = new MemoryStream())
                {
                    using (var flattener = new PdfStamper(r, ms))
                    {
                        for ( var i = 1; i <= r.NumberOfPages; i++ )
                        {
                            r.AcroFields.RemoveFieldsFromPage( i );
                        }
                        flattener.FormFlattening = true;
                        flattener.Close();
                    }
                    flatBytes = ms.ToArray();
                }
            }

如果在用压模进行测试时，显然在顶部代码中，我使用的是flatBytes，而不是pdfBytes.

Obviously in the top code if I was testing with the stamper I was using flatBytes and not pdfBytes.

完整的异常消息:没有数据可用于编码10000.有关定义自定义编码的信息，请参阅Encoding.RegisterProvider方法的文档.

Full exception message: No data is available for encoding 10000. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.

Stack Trace:
   at System.Text.Encoding.GetEncoding(Int32 codepage)
   at System.Text.Encoding.GetEncoding(Int32 codepage, EncoderFallback encoderFallback, DecoderFallback decoderFallback)
   at iTextSharp.text.xml.simpleparser.IanaEncodings.GetEncodingEncoding(String name)
   at iTextSharp.text.pdf.PdfEncodings.ConvertToString(Byte[] bytes, String encoding)
   at iTextSharp.text.pdf.DocumentFont.FillEncoding(PdfName encoding)
   at iTextSharp.text.pdf.DocumentFont.DoType1TT()
   at iTextSharp.text.pdf.DocumentFont.Init()
   at iTextSharp.text.pdf.DocumentFont..ctor(PRIndirectReference refFont)
   at iTextSharp.text.pdf.CMapAwareDocumentFont..ctor(PRIndirectReference refFont)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.GetFont(PRIndirectReference ind)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.SetTextFont.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.FormXObjectDoHandler.HandleXObject(PdfContentStreamProcessor processor, PdfStream stream, PdfIndirectReference refi, ICollection markedContentInfoStack)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.DisplayXObject(PdfName xobjectName)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.Do.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
   at iTextSharp.text.pdf.parser.PdfReaderContentParser.ProcessContent[E](Int32 pageNumber, E renderListener, IDictionary`2 additionalContentOperators)
   at iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(PdfReader reader, Int32 pageNumber, ITextExtractionStrategy strategy)
   at VerataParsers.ECWPdfExtractor.StripImagesFromPdf(Byte[] pdfBytes, Int32& adjustedPageCount) in C:\Users\Dell T5610\source\repos\SecureDirectMessaging\VerataParsers\ECWPdfExtractor.cs:line 85

iTextSharp 5.5.13.1从PDF提取文本时，没有数据可用于编码10000 [英] iTextSharp 5.5.13.1 no data is available for encoding 10000 when extracting text from PDF

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

iTextSharp 5.5.13.1从PDF提取文本时，没有数据可用于编码10000 [英] iTextSharp 5.5.13.1 no data is available for encoding 10000 when extracting text from PDF

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭