如何使用java和PDFBox从PDF获取字符的Unicode [英] How to get Unicode of the characters from PDF using java and PDFBox

查看：766 发布时间：2018/12/21 20:04:12 java pdf unicode pdfbox

本文介绍了如何使用java和PDFBox从PDF获取字符的Unicode的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用Apache PDFBox和Java来解析PDF并从中获取所有信息。提取文本仅适用于英语。对于其他语言，我只获得一些特殊字符。例如，提取阿拉伯字符ش将给出字符串：？on printing。当我将计算机的区域和语言从英语更改为阿拉伯语时，它工作正常。所以我认为提取字符的Unicode将解决这个问题问题。请帮我从PDF中获取字符的Unicode或建议我解决这个问题的一些解决方案。

I am using Apache PDFBox and Java to parse the PDFs and get all the information from it. Extracting text is working fine for English only. For other languages I get only some special-characters. For example extracting the Arabic character ش will give the String :"? on printing. It is working fine when I change the "Region and language" of my computer from English to Arabic. So I think extracting the Unicode of the characters will solve this problem. Please help me to get the Unicode of the characters from PDF or suggest me some solutions to solve this problem.

如何使用java和PDFBox从PDF获取字符的Unicode [英] How to get Unicode of the characters from PDF using java and PDFBox

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何使用java和PDFBox从PDF获取字符的Unicode [英] How to get Unicode of the characters from PDF using java and PDFBox

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭