使用Indy发布且文件名包含希腊字符时,文件上传失败 [英] File upload fails, when posting with Indy and filename contains Greek characters

查看:96
本文介绍了使用Indy发布且文件名包含希腊字符时,文件上传失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对Web服务实现POST.我需要发送一个类型为变量(.docx.pdf.txt)的文件以及JSON格式的字符串.

I am trying to implement a POST to a web service. I need to send a file whose type is variable (.docx, .pdf, .txt) along with a JSON formatted string.

我设法使用类似于以下代码的代码成功发布文件:

I have manage to post files successfully with code similar to the following:

procedure DoRequest;
var
  Http: TIdHTTP;
  Params: TIdMultipartFormDataStream;
  RequestStream, ResponseStream: TStringStream;
  JRequest, JResponse: TJSONObject;
  url: string;
begin
  url := 'some_custom_service'

  JRequest := TJSONObject.Create;
  JResponse := TJSONObject.Create;
  try
    JRequest.AddPair('Pair1', 'Value1');
    JRequest.AddPair('Pair2', 'Value2');
    JRequest.AddPair('Pair3', 'Value3');

    Http := TIdHTTP.Create(nil);           
    ResponseStream := TStringStream.Create;
    RequestStream := TStringStream.Create(UTF8Encode(JRequest.ToString));
    try   
      Params := TIdMultipartFormDataStream.Create;
      Params.AddFile('File', ceFileName.Text, '').ContentTransfer := '';
      Params.AddFormField('Json', 'application/json', '', RequestStream);

      Http.Post(url, Params, ResponseStream);
      JResponse := TJSONObject.ParseJSONValue(ResponseStream.DataString) as TJSONObject;
    finally    
      RequestStream.Free;
      ResponseStream.Free;
      Params.Free;
      Http.Free;
    end;
  finally
    JRequest.Free;
    JResponse.Free;
  end;
end;

当我尝试发送文件名中包含希腊字符和空格的文件时,出现问题.有时失败,有时成功.

The problem appears when I try to send a file that contains Greek characters and spaces in the filename. Sometimes it fails and sometimes it succeeds.

经过大量研究,我注意到POST标头是由Indy的TIdFormDataField类使用EncodeHeader()函数编码的.当发布失败时,与未拆分的成功发布相比,标题中的编码文件名将被拆分.

After a lot of research, I notice that the POST header is encoded by Indy's TIdFormDataField class using the EncodeHeader() function. When the post fails, the encoded filename in the header is split, compared to the successful post where is not split.

例如:

  • Επιστολή εκπαιδευτικο.docx编码为=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?=,失败.
  • Επιστολή εκπαιδευτικ.docx编码为 =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?=,成功.
  • Επιστολή εκπαιδευτικ .docx编码为 =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx,失败.
  • Επιστολή εκπαιδευτικο.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#$D#$A' =?UTF-8?B?eA==?=, which fails.
  • Επιστολή εκπαιδευτικ.docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?=, which succeeds.
  • Επιστολή εκπαιδευτικ .docx is encoded as =?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx, which fails.

我试图更改文件名,AddFile()过程的AContentTypeContentTransfer的编码,但是这些都没有改变行为,并且在分割编码的文件名时仍然会出错

I have tried to change the encoding of the filename, the AContentType of the AddFile() procedure, and the ContentTransfer, but none of those change the behavior, and I still get errors when the encoded filename is split.

这是某种错误,还是我错过了什么?

Is this some kind of bug, or am I missing something?

除上述情况外,我的代码适用于所有情况.

My code works for every case except those I described above.

我在Indy10上使用Delphi XE3.

I am using Delphi XE3 with Indy10.

推荐答案

EncodeHeader()确实存在一些与Unicode字符串有关的已知问题:

EncodeHeader() does have some known issues with Unicode strings:

在相邻编码字之间拆分数据时,EncodeHeader()需要考虑编码单位

基本上,一个MIME编码的单词的长度不能超过75个字符,因此长文本会被分割.但是,当对长的Unicode字符串进行编码时,任何给定的Unicode字符都可以使用1个或多个字节进行字符集编码,并且EncodeHeader()仍不能避免将两个字节之间的多字节字符错误地拆分为单独的编码字(这是非法的)并被MIME规范的 RFC 2047 明确禁止.

Basically, an MIME-encoded word cannot be more than 75 characters in length, so long text gets split up. But when encoding a long Unicode string, any given Unicode character may be charset-encoded using 1 or more bytes, and EncodeHeader() does not yet avoid erroneously splitting a multi-byte character between two individual bytes into separate encoded words (which is illegal and explicitly forbidden by RFC 2047 of the MIME spec).

但是,这不是您的示例中发生的情况.

However, that is not what is happening in your examples.

在您的第一个示例中,'Επιστολή εκπαιδευτικο.docx'太长,无法编码为单个MIME字,因此将其拆分为'Επιστολή εκπαιδευτικο.doc' 'x'子字符串,然后分别对其进行编码. 对于长文本来说,这在MIME中是合法的(尽管您可能希望Indy将文本拆分为'Επιστολή' ' εκπαιδευτικο.doc'甚至是'Επιστολή' ' εκπαιδευτικο' '.doc'.)未来版本中的可能性).仅由空格分隔的相邻MIME字应被连接在一起,而在解码时不分隔空格,从而再次产生'Επιστολή εκπαιδευτικο.docx'.如果服务器没有这样做,则说明其解码器存在缺陷(也许是解码为'Επιστολή εκπαιδευτικο.doc x'吗?).

In your first example, 'Επιστολή εκπαιδευτικο.docx' is too long to be encoded as a single MIME word, so it gets split into 'Επιστολή εκπαιδευτικο.doc' 'x' substrings, which are then encoded separately. This is legal in MIME for long text (though you might have expected Indy to split the text into 'Επιστολή' ' εκπαιδευτικο.doc' instead, or even 'Επιστολή' ' εκπαιδευτικο' '.doc'. That might be a possibility in a future release). Adjacent MIME words that are separated by only whitespace are meant to be concatenated together without separating whitespace when decoded, thus producing 'Επιστολή εκπαιδευτικο.docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικο.doc x' instead?).

在第二个示例中,'Επιστολή εκπαιδευτικ.docx'足够短,可以被编码为单个MIME字.

In your second example, 'Επιστολή εκπαιδευτικ.docx' is short enough to be encoded as a single MIME word.

在您的第三个示例中,'Επιστολή εκπαιδευτικ .docx'在第二个空格(不是第一个空格)上被拆分为'Επιστολή εκπαιδευτικ' ' .docx'子字符串,并且只需要对第一个子字符串进行编码. 这在MIME中是合法的.解码时,解码后的文本应与以下未编码的文本连接在一起,并保留它们之间的空白,从而再次产生'Επιστολή εκπαιδευτικ .docx'.如果服务器没有这样做,则说明其解码器存在缺陷(也许是解码为'Επιστολή εκπαιδευτικ.docx'吗?).

In your third example, 'Επιστολή εκπαιδευτικ .docx' gets split on the second whitespace (not the first) into 'Επιστολή εκπαιδευτικ' ' .docx' substrings, and only the first substring needs to be encoded. This is legal in MIME. When decoded, the decoded text is meant to be concatenated with the following unencoded text, preserving whitespace between them, thus producing 'Επιστολή εκπαιδευτικ .docx' again. If the server is not doing that, it has a flaw in its decoder (maybe it is decoding as 'Επιστολή εκπαιδευτικ.docx' instead?).

如果通过Indy的MIME标头编码器/解码器运行这些示例文件名,它们将正确解码:

If you run these example filenames through Indy's MIME header encoder/decoder, they do decode properly:

var
  s: String;
begin
  s := EncodeHeader('Επιστολή εκπαιδευτικο.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66zr8uZG9j?='#13#10' =?UTF-8?B?eA==?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικο.docx'

  s := EncodeHeader('Επιστολή εκπαιδευτικ.docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66LmRvY3g=?='
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ.docx' 

  s := EncodeHeader('Επιστολή εκπαιδευτικ .docx', '', 'B', 'UTF-8');
  ShowMessage(s); // '=?UTF-8?B?zpXPgM65z4PPhM6/zrvOriDOtc66z4DOsc65zrTOtc+Fz4TOuc66?= .docx' 
  s := DecodeHeader(s);
  ShowMessage(s); // 'Επιστολή εκπαιδευτικ .docx'
end;

因此问题似乎出在服务器端解码上,而不是Indy的客户端编码上.

So the problem seems to be on the server side decoding, not on Indy's client side encoding.

也就是说,如果您使用的是Indy 10的较新版本(2011年11月或更高版本),则TIdFormDataField具有HeaderEncoding属性,在Unicode环境中默认为'B'(base64).但是,拆分逻辑也会影响'Q'(带引号的可打印内容),因此可能对您也可能不起作用(但您可以尝试):

That being said, if you are using a fairly recent version of Indy 10 (Nov 2011 or later), TIdFormDataField has a HeaderEncoding property, which defaults to 'B' (base64) in Unicode environments. However, the splitting logic also affects 'Q' (quoted-printable) as well, so that may or may not work for you, either (but you can try it):

with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := 'Q'; // <--- here
  HeaderCharSet := 'utf-8';
end;

否则,一种解决方法可能是将值更改为'8'(8位),从而有效地禁用MIME编码(但不禁用字符集编码):

Otherwise, a workaround might be to change the value to '8' (8-bit) instead, which effectively disables MIME encoding (but not charset encoding):

with Params.AddFile('File', ceFileName.Text, '') do
begin
  ContentTransfer := '';
  HeaderEncoding := '8'; // <--- here
  HeaderCharSet := 'utf-8';
end;

请注意,如果服务器不希望文件名使用原始UTF-8字节,则可能仍会遇到问题(例如,'Επιστολή εκπαιδευτικο.docx'被解释为'Επιστολή εκπαιδευτικο.docx').

Just note that if the server is not expecting raw UTF-8 bytes for the filename, you might still run into problems (ie, 'Επιστολή εκπαιδευτικο.docx' being interpreted as 'Επιστολή εκπαιδευτικο.docx', for instance).

这篇关于使用Indy发布且文件名包含希腊字符时,文件上传失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆