如何使用PDPageContentStream / PDDocument在Java中处理非ASCII字符 [英] How to handle non-ASCII Characters in Java while using PDPageContentStream/PDDocument

查看:1803
本文介绍了如何使用PDPageContentStream / PDDocument在Java中处理非ASCII字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 PDFBox 从我的网络应用程序创建PDF。 Web应用程序是用Java构建的,并使用JSF。它从基于Web的表单中获取内容,并将内容放入PDF文档中。

I am using PDFBox to create PDF from my web application. The web application is built in Java and uses JSF. It takes the content from a web based form and puts the contents into a PDF document.

示例:用户填写表单中的inputTextArea(JSF标记),并转换为PDF。我无法处理非ASCII字符。

Example: A user fill up an inputTextArea (JSF tag) in the form and that is converted to a PDF. I am unable to handle non-ASCII Characters.

如何处理非ASCII字符或至少将其删除,然后再将其放在PDF上。请帮助我任何建议或指出我的任何资源。感谢!

How should I handle the non-ASCII characters or atleast strip them out before putting it on the PDF. Please help me with any suggestions or point me any resources. Thanks!

推荐答案

由于您在JSP上使用JSF而不是Facelets(已隐式使用UTF-8)以下步骤来避免使用平台缺省字符集(这通常是ISO-8859-1,这是处理大多数非ASCII字符的错误选择):

Since you're using JSF on JSP instead of Facelets (which is implicitly already using UTF-8), do the following steps to avoid the platform default charset being used (which is often ISO-8859-1, which is the wrong choice for handling of the majority of "non-ASCII" characters):


  1. 将以下行添加到所有JSP的顶部:

  1. Add the following line to top of all JSPs:

<%@ page pageEncoding="UTF-8" %>

这会将响应编码设置为UTF-8 HTTP响应内容类型头为UTF-8。最后一个将指示客户端(网络浏览器)使用UTF-8显示并提交带有表单的页面。

This sets the response encoding to UTF-8 and sets the charset of the HTTP response content type header to UTF-8. The last will instruct the client (webbrowser) to display and submit the page with the form using UTF-8.

创建 过滤器 执行以下操作 doFilter()方法:

request.setCharacterEncoding("UTF-8");

FacesServlet 上映射如下:

<filter-mapping>
    <filter-name>nameOfYourCharacterEncodingFilter</filter-name>
    <servlet-name>nameOfYourFacesServlet</servlet-name>
</filter-mapping>

这将所有JSF POST请求的请求编码设置为UTF-8。

This sets the request encoding of all JSF POST requests to UTF-8.

这应该解决JSF端的Unicode问题。我从来没有使用过PDFBox,但是因为它在覆盖下使用iText,而这反过来应该已经支持Unicode / UTF-8,我认为这一部分是好的。

This should fix the Unicode problem in the JSF side. I have never used PDFBox, but since it's under the covers using iText which in turn should already be supporting Unicode/UTF-8, I think that part is fine. Let me know if it still doesn't after doing the above fixes.

  • Unicode - How to get the characters right?

这篇关于如何使用PDPageContentStream / PDDocument在Java中处理非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆