如何在C＃中提取时从pdf中获取正确的文本？ [英] How do I get proper text from pdf while extraction in C#?

查看：303 发布时间：2019/6/7 11:31:13 C# XML .NET Visual-Studio

本文介绍了如何在C＃中提取时从pdf中获取正确的文本？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个从PDF中提取文本的程序。提取工作正常，在将文件保存为XML之前，我必须将几个符号转换为各自的十六进制代码。

问题在于所有符号中的符号当我将■保存到XML文件中时，它会转换为|。

然后我将其保存到所需的手动替换它文件。

请帮忙。

我的尝试：

我只需要一个基本的想法，我该如何摆脱这个。

I am writing a program to extract text from PDF. The extraction is working fine and I have to convert a few symbols to their respective hex codes before saving the file as XML.

The issue is that out of all the symbols, when I am saving "■" into an XML file, it is getting converted to "¦".

I am then manually replacing it before saving it to the desired file.

Please help.

What I have tried:

I just need a basic idea as to how can I get rid of this.

如何在C＃中提取时从pdf中获取正确的文本？ [英] How do I get proper text from pdf while extraction in C#?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

如何在C＃中提取时从pdf中获取正确的文本？ [英] How do I get proper text from pdf while extraction in C#?

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭