TIdHTTP字符编码POST响应 [英] TIdHTTP character encoding of POST response

查看:272
本文介绍了TIdHTTP字符编码POST响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

采取以下情况:

procedure Test;

var
 Response : String;

begin
 Response := IdHttp.Post(MyUrL, AStream);
 DoSomethingWith(Response);
end;

现在Web服务器以UTF-8返回数据。
假设它给我一些包含字符é的UTF-8 XML。
如果我使用变量Response,它不包含这个字符,但它是UTF-8变体(#C3#A9),所以Indy没有解码?

Now the webserver returns me data in UTF-8. Suppose it returns me some UTF-8 XML containing the character é. If I use the variable Response it does not contain this character but it's UTF-8 variant (#C3#A9), so Indy did no decoding?

现在我知道如何解决这个问题:

procedure Test;

var
 Response : String;

begin
 Response := UTF8ToString(IdHttp.Post(MyUrL, AStream));
 DoSomethingWith(Response);
end;

使用此解决方案的一个警告:Delphi引发警告W1058(隐式字符串转换,潜在的数据丢失从字符串'到'RawByteString')

One caveat with this solution: Delphi raises warning W1058 (Implicit string cast with potential data loss from 'string' to 'RawByteString')

我的问题:这是处理这个问题的正确方法,还是可以指示TIdHTTP对我进行UnicodeString的转换? p>

My question : is this the correct way to deal with this problem or can I instruct TIdHTTP to do the conversion to UnicodeString for me?

推荐答案

如果您使用的是Indy 10的最新版本,那么重载版本的 TIdHTTP 。返回一个 String 将数据解码为Unicode的.Post(),然而用于解码的实际字符集取决于HTTP Content-Type 响应头指定的媒体类型:

If you are using an up-to-date version of Indy 10, then the overloaded version of TIdHTTP.Post() that returns a String does decode the data to Unicode, however the actual charset used for the decoding depends on what media type the HTTP Content-Type response header specifies:


  1. 如果媒体类型是 application / xml application / xml-external-parsed-entity application / xml-dtd ,或不是 text /... type,但以 + xml ,然后指定字符集使用XML的序言的编码属性。如果没有指定字符集,则使用UTF-8。

  1. if the media type is either application/xml, application/xml-external-parsed-entity, application/xml-dtd, or is not a text/... type but does end with +xml, then the charset specified in the encoding attribute of the XML's prolog is used. If no charset is specified, UTF-8 is used.

否则,如果 Content-Type 响应头指定一个字符集,然后使用它。

otherwise, if the Content-Type response header specifies a charset, then it is used.

否则,如果媒体类型是 text /...类型,然后:

otherwise, if the media type is a text/... type, then:

a。如果媒体类型是 text / xml text / xml-external-parsed-entity ,或以code> + xml ,然后使用 us-ascii

a. if the media type is text/xml, text/xml-external-parsed-entity, or ends with +xml, then us-ascii is used.

b 。否则,使用 ISO-8859-1

否则,Indy的默认编码(默认为ASCII)使用

otherwise, Indy's default encoding (ASCII by default) is used.

没有看到实际的HTTP 内容类型标题,很难知道你的情况如何。这听起来像是落入#2或#3b,这将考虑到UTF-8字节值按原样返回,如果 ISO-8859-1 或正在使用类似的字符集。

Without seeing the actual HTTP Content-Type header, it is hard to know which condition your situation falls into. It sounds like it is falling into either #2 or #3b, which would account for the UTF-8 byte values being returned as-is, if ISO-8859-1 or similar charset is being used.

UTF8ToString()期望一个UTF-8编码的 RawByteString 作为输入,但您正在传递一个UTF-16编码的 UnicodeString 。在这种情况下,RTL将执行UTF16-> Ansi转换,使用默认的Ansi字符集进行转换。这就是为什么你得到编译器警告,因为这样的转换可能会丢失数据。

UTF8ToString() expects a UTF-8 encoded RawByteString as input, but you are passing it a UTF-16 encoded UnicodeString instead. The RTL will perform a UTF16->Ansi conversion in that situation, using a default Ansi charset for the conversion. That is why you get the compiler warning, because such a conversion can lose data.

XML是一种二进制数据格式,符合字符集编码。 XML解析器需要知道XML的编码是什么,并且能够相应地解析原始的编码字节。这就是为什么XML在XML序言中具有明确的编码属性。但是,当 TIdHTTP 将XML下载为 String 时,虽然它会自动将其解码为Unicode,但

XML is really a binary data format, subject to charset encodings. An XML parser needs to know what the XML's encoding is, and be able to parse the raw encoded bytes accordingly. That is why XML has an explicit encoding attribute right in the XML prolog. However, when TIdHTTP downloads XML as a String, although it does automatically decode it to Unicode, it does not yet update the XML's prolog accordingly.

真正的解决方案是不要将XML下载为 String 首先。将其下载为 TStream 而不是( TMemoryStream 是比 TStringStream ),所以您的XML解析器可以访问原始字节,原始字符集声明等。您可以将 TStream 传递到 TXMLDocument。 LoadFromStream()方法,例如。

The real solution is to not download XML as a String in the first place. Download it as a TStream instead (TMemoryStream is a better choice than TStringStream) so your XML parser has access to the original bytes, the original charset declaration, etc. You can pass the TStream to the TXMLDocument.LoadFromStream() method, for instance.

这篇关于TIdHTTP字符编码POST响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆