使用AcroForm技术提交PDF表单时的数据编码 [英] Data encoding when submitting a PDF form using AcroForm technology
问题描述
当我创建一个PDF表单(例如使用Acrobat)时,它包含AcroForm格式的文本字段(PDF词典,没有XFA),然后将数据提交到服务器,如何指定/获取将要使用的编码用过吗?
When I create a PDF form (for instance using Acrobat) that contains text fields in AcroForm format (PDF dictionaries, no XFA), and I submit the data to a server, how can I specify/retrieve the encoding that will be used?
例如.提交中文字形测试"(测试)时,在服务器端收到以下标头和内容:
For instance. When I submit the Chinese glyphs '测试' (test), I receive the following headers and content on the server-side:
accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
content-type: application/x-www-form-urlencoded
content-length: 23
acrobat-version: 10.1.4
user-agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC; .NET4.0C; AskTbCLA/5.15.1.22229)
accept-encoding: gzip, deflate
connection: Keep-Alive
Song=%b2%e2%ca%d4&Test=
除x-www-form-urlencoded外,没有对编码的引用.这两个字形被表示为四个字节:B2 E2 CA D4.经过调查后,我知道B2E2是第一个字形的GBK值,而CAD4是第二个字形的GBK值,但是我不能从请求标头中得出.
There's no reference to an encoding, except x-www-form-urlencoded. The two glyphs are represented as four bytes: B2 E2 CA D4. After some investigation, I know that B2E2 is the GBK value for the first glyph, and CAD4 the GBK value for the second glyph, but I can't derive this from the request header.
总是GBK吗?我想通过在PDF的字典中设置特定的键来更改数据编码,但似乎没有任何键.例如:我想确保PDF始终发送Unicode字符而不是GBK.
Is it always GBK? I want to change the data encoding by setting a specific key in a dictionary in the PDF, but there doesn't seem to be any. For instance: I would like make sure the PDF always sends Unicode characters instead of GBK.
请注意,我已经通过更改文本字段的默认字体(和编码)进行了实验.我还在ISO-32000-1中搜索了字段中的编码,但是发现的所有方法都是为复选框定义非拉丁字符的方式,以及有关FDF文件编码的一些信息.没有一个回答我的问题.
Note that I've already experimented by changing the default font (and encoding) of the text field. I've also searched ISO-32000-1 for encodings in fields, but all I found was a way to define non-Latin characters for check boxes, and some info about the encoding of an FDF file. None of which answered my questions.
推荐答案
我自己已经找到了主要问题的答案.我没有在ISO-32000-1或ISO-32000-2草案中找到任何内容,但是在研究Acrobat JavaScript参考时,我发现了cCharset
参数可用于submitForm()
方法.该参数定义:
I've just found the answer to my main question myself. I didn't find anything in ISO-32000-1 or the ISO-32000-2 draft, but studying the Acrobat JavaScript reference, I found the cCharset
parameter that is available for the submitForm()
method. That parameter defines:
提交的值的编码.字符串值为utf-8, utf-16,Shift-JIS,BigFive,GBK和UHC.如果未通过,则当前 Acrobat行为适用.对于基于XML的格式,使用utf-8.为了 其他格式,Acrobat会尝试为 提交的值. XFDF提交会始终忽略此值 使用utf-8.
The encoding for the values submitted. String values are utf-8, utf-16, Shift-JIS, BigFive, GBK, and UHC. If not passed, the current Acrobat behavior applies. For XML-based formats, utf-8 is used. For other formats, Acrobat tries to find the best host encoding for the values being submitted. XFDF submission ignores this value and always uses utf-8.
换句话说:在我的情况下,使用GBK是因为它最适合提交汉字.但是,可以使用submitForm()
JavaScript方法并使用适当的值来强制使用UTF-8.
In other words: in my case GBK was used because it fits best to submit Chinese characters. However, one could force UTF-8 by using the submitForm()
JavaScript method using the appropriate value.
基于此问题,我已要求ISO委员会在ISO-32000-2中解决此问题. 结果,在第12.7.6.2节中的特定于提交表单操作的其他条目的表中添加了一个额外的可能条目:
Based on this question, I have asked the ISO committee to fix this problem in ISO-32000-2. As a result, an extra possible entry was added to the table entitled Additional entries specific to a submit-form action in section 12.7.6.2:
字符集:字符串
(可选;可继承),可能的值包括: utf-8 , utf-16 , Shift-JIS , BigFive , GBK 或 UHC .
(Optional; inheritable) Possible values include: utf-8, utf-16, Shift-JIS, BigFive, GBK, or UHC.
从PDF 2.0开始,此问题将不再存在.
Starting with PDF 2.0, this problem will no longer exist.
更新:我的建议是ISO 32000-2(又名PDF 2.0):
Update: my suggestion made ISO 32000-2 (aka PDF 2.0):
ISO 32000-1中不存在 CharSet 键;它是在ISO 32000-2中引入的.
The CharSet key doesn't exist in ISO 32000-1; it was introduced in ISO 32000-2.
这篇关于使用AcroForm技术提交PDF表单时的数据编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!