如何使用PDPageContentStream / PDDocument在Java中处理非ASCII字符 [英] How to handle non-ASCII Characters in Java while using PDPageContentStream/PDDocument
问题描述
我使用 PDFBox 从我的网络应用程序创建PDF。 Web应用程序是用Java构建的,并使用JSF。它从基于Web的表单中获取内容,并将内容放入PDF文档中。
I am using PDFBox to create PDF from my web application. The web application is built in Java and uses JSF. It takes the content from a web based form and puts the contents into a PDF document.
示例:用户填写表单中的inputTextArea(JSF标记),并转换为PDF。我无法处理非ASCII字符。
Example: A user fill up an inputTextArea (JSF tag) in the form and that is converted to a PDF. I am unable to handle non-ASCII Characters.
如何处理非ASCII字符或至少将其删除,然后再将其放在PDF上。请帮助我任何建议或指出我的任何资源。感谢!
How should I handle the non-ASCII characters or atleast strip them out before putting it on the PDF. Please help me with any suggestions or point me any resources. Thanks!
推荐答案
由于您在JSP上使用JSF而不是Facelets(已隐式使用UTF-8)以下步骤来避免使用平台缺省字符集(这通常是ISO-8859-1,这是处理大多数非ASCII字符的错误选择):
Since you're using JSF on JSP instead of Facelets (which is implicitly already using UTF-8), do the following steps to avoid the platform default charset being used (which is often ISO-8859-1, which is the wrong choice for handling of the majority of "non-ASCII" characters):
-
将以下行添加到所有JSP的顶部:
Add the following line to top of all JSPs:
<%@ page pageEncoding="UTF-8" %>
这会将响应编码设置为UTF-8 和 HTTP响应内容类型头为UTF-8。最后一个将指示客户端(网络浏览器)使用UTF-8显示并提交带有表单的页面。
This sets the response encoding to UTF-8 and sets the charset of the HTTP response content type header to UTF-8. The last will instruct the client (webbrowser) to display and submit the page with the form using UTF-8.
创建 过滤器
执行以下操作 doFilter()
方法:
request.setCharacterEncoding("UTF-8");
在 FacesServlet
上映射如下:
<filter-mapping>
<filter-name>nameOfYourCharacterEncodingFilter</filter-name>
<servlet-name>nameOfYourFacesServlet</servlet-name>
</filter-mapping>
这将所有JSF POST请求的请求编码设置为UTF-8。
This sets the request encoding of all JSF POST requests to UTF-8.
这应该解决JSF端的Unicode问题。我从来没有使用过PDFBox,但是因为它在覆盖下使用iText,而这反过来应该已经支持Unicode / UTF-8,我认为这一部分是好的。
This should fix the Unicode problem in the JSF side. I have never used PDFBox, but since it's under the covers using iText which in turn should already be supporting Unicode/UTF-8, I think that part is fine. Let me know if it still doesn't after doing the above fixes.
- Unicode - How to get the characters right?
这篇关于如何使用PDPageContentStream / PDDocument在Java中处理非ASCII字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!