使用 AcroForm 技术提交 PDF 表单时的数据编码 [英] Data encoding when submitting a PDF form using AcroForm technology

查看:18
本文介绍了使用 AcroForm 技术提交 PDF 表单时的数据编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我创建一个包含 AcroForm 格式(PDF 字典,无 XFA)文本字段的 PDF 表单(例如使用 Acrobat)并将数据提交到服务器时,我如何指定/检索将要使用的编码用过吗?

When I create a PDF form (for instance using Acrobat) that contains text fields in AcroForm format (PDF dictionaries, no XFA), and I submit the data to a server, how can I specify/retrieve the encoding that will be used?

例如.当我提交中文字形'测试'(测试)时,我在服务器端收到以下标题和内容:

For instance. When I submit the Chinese glyphs '测试' (test), I receive the following headers and content on the server-side:

accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
content-type: application/x-www-form-urlencoded
content-length: 23
acrobat-version: 10.1.4
user-agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC; .NET4.0C; AskTbCLA/5.15.1.22229)
accept-encoding: gzip, deflate
connection: Keep-Alive
Song=%b2%e2%ca%d4&Test=

没有对编码的引用,除了 x-www-form-urlencoded.这两个字形表示为四个字节:B2 E2 CA D4.经过一番调查,我知道 B2E2 是第一个字形的 GBK 值,而 CAD4 是第二个字形的 GBK 值,但我无法从请求头中推导出来.

There's no reference to an encoding, except x-www-form-urlencoded. The two glyphs are represented as four bytes: B2 E2 CA D4. After some investigation, I know that B2E2 is the GBK value for the first glyph, and CAD4 the GBK value for the second glyph, but I can't derive this from the request header.

总是GBK吗?我想通过在 PDF 的字典中设置特定键来更改数据编码,但似乎没有.例如:我想确保 PDF 总是发送 Unicode 字符而不是 GBK.

Is it always GBK? I want to change the data encoding by setting a specific key in a dictionary in the PDF, but there doesn't seem to be any. For instance: I would like make sure the PDF always sends Unicode characters instead of GBK.

请注意,我已经通过更改文本字段的默认字体(和编码)进行了试验.我还搜索了 ISO-32000-1 以查找字段中的编码,但我发现的只是一种为复选框定义非拉丁字符的方法,以及一些有关 FDF 文件编码的信息.没有一个回答我的问题.

Note that I've already experimented by changing the default font (and encoding) of the text field. I've also searched ISO-32000-1 for encodings in fields, but all I found was a way to define non-Latin characters for check boxes, and some info about the encoding of an FDF file. None of which answered my questions.

推荐答案

我刚刚自己找到了主要问题的答案.我在 ISO-32000-1 或 ISO-32000-2 草案中没有找到任何内容,但是研究了 Acrobat JavaScript 参考,我发现了可用于 submitForm() 方法.该参数定义:

I've just found the answer to my main question myself. I didn't find anything in ISO-32000-1 or the ISO-32000-2 draft, but studying the Acrobat JavaScript reference, I found the cCharset parameter that is available for the submitForm() method. That parameter defines:

提交的值的编码.字符串值为 utf-8,utf-16、Shift-JIS、BigFive、GBK 和 UHC.如果没有通过,当前Acrobat 行为适用.对于基于 XML 的格式,使用 utf-8.为了其他格式,Acrobat 会尝试找到最适合的主机编码提交的值.XFDF 提交忽略此值并且始终使用 utf-8.

The encoding for the values submitted. String values are utf-8, utf-16, Shift-JIS, BigFive, GBK, and UHC. If not passed, the current Acrobat behavior applies. For XML-based formats, utf-8 is used. For other formats, Acrobat tries to find the best host encoding for the values being submitted. XFDF submission ignores this value and always uses utf-8.

换句话说:在我的情况下使用 GBK 是因为它最适合提交中文字符.但是,可以通过使用适当值的 submitForm() JavaScript 方法强制使用 UTF-8.

In other words: in my case GBK was used because it fits best to submit Chinese characters. However, one could force UTF-8 by using the submitForm() JavaScript method using the appropriate value.

基于这个问题,我已经要求 ISO 委员会在 ISO-32000-2 中解决这个问题.因此,在第 12.7.6.2 节中标题为特定于提交表单操作的附加条目的表中添加了一个额外的可能条目:

Based on this question, I have asked the ISO committee to fix this problem in ISO-32000-2. As a result, an extra possible entry was added to the table entitled Additional entries specific to a submit-form action in section 12.7.6.2:

CharSet:字符串

(可选;可继承)可能的值包括:utf-8utf-16Shift-JISBigFiveGBKUHC.

(Optional; inheritable) Possible values include: utf-8, utf-16, Shift-JIS, BigFive, GBK, or UHC.

从 PDF 2.0 开始,这个问题将不再存在.

Starting with PDF 2.0, this problem will no longer exist.

更新:我的建议使 ISO 32000-2(又名 PDF 2.0):

Update: my suggestion made ISO 32000-2 (aka PDF 2.0):

CharSet 键在 ISO 32000-1 中不存在;它是在 ISO 32000-2 中引入的.

The CharSet key doesn't exist in ISO 32000-1; it was introduced in ISO 32000-2.

这篇关于使用 AcroForm 技术提交 PDF 表单时的数据编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆