德尔福印地utf8 [英] Delphi & Indy & utf8
问题描述
我无法访问utf8字符集的网站,例如,当我尝试在此www上访问时
i have a problem to access into websites whit utf8 charset, for example when i try to accesso at this www
所有utf8字符均未正确编码. 这是我的访问例程:
all utf8 characters are not correctly codified. This is my access routine:
var
Web : TIdHTTP;
Sito : String;
hIOHand : TIdSSLIOHandlerSocketOpenSSL;
begin
Url := TIdURI.URLEncode(Url);
try
Web := TIdHTTP.Create(nil);
hIOHand := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
hIOHand.DefStringEncoding := IndyTextEncoding_UTF8;
hIOHand.SSLOptions.SSLVersions := [sslvTLSv1,sslvTLSv1_1,sslvTLSv1_2,sslvSSLv2,sslvSSLv3,sslvSSLv23];
Web.IOHandler := hIOHand;
Web.Request.CharSet := 'utf-8';
Web.Request.UserAgent := INET_USERAGENT; //Custom user agent string
Web.RedirectMaximum := INET_REDIRECT_MAX; //Maximum redirects
Web.HandleRedirects := INET_REDIRECT_MAX <> 0; //Handle redirects
Web.ReadTimeOut := INET_TIMEOUT_SECS * 1000; //Read timeout msec
try
Sito := Web.Get(Url);
Web.Disconnect;
except
on e : exception do
Sito := 'ERR: ' +Url+#32+e.Message;
end;
finally
Web.Free;
hIOHand.Free;
end;
我尝试了所有解决方案,但在Sito var中,我一直都找到错误的字符,例如,名称"的正确值是
I try all solution but in the Sito var i find alltime wrong characthers, for example correct value of the "name" is
名称":"Parcan national du Mercantour国家公园",
"name": "Aire d'adhésion du Parc national du Mercantour",
但是在获得Get指令后
but after the Get instruction i have
名称":"Parcan national du Mercantour国家公园",
"name": "Aire d'adhésion du Parc national du Mercantour",
您知道我的错误在哪里吗? 谢谢大家!
Do you have idea where is my error? Thankyou all!
推荐答案
在包含XE6的Delphi 2009+中,string
是UTF-16编码的UnicodeString
.
In Delphi 2009+, which includes XE6, string
is a UTF-16 encoded UnicodeString
.
您正在使用TIdHTTP.Get()
的重载版本,该版本返回string
.它使用响应报告的任何字符集将发送的文本解码为UTF-16.如果文本解码不正确,则可能表示响应未报告正确的字符集.如果使用了错误的字符集,则文本将无法正确解码.
You are using the overloaded version of TIdHTTP.Get()
that returns a string
. It decodes the sent text to UTF-16 using whatever charset is reported by the response. If the text is not decoding properly, it likely means the response is not reporting a correct charset. If the wrong charset is used, the text will not decode properly.
实际上,所讨论的URL正在发送设置为application/json
的响应Content-Type
标头,而根本没有指定charset
. application/json
的默认字符集为UTF-8,但Indy不知道,因此最终使用了自己的内部默认值,而不是UTF-8.这就是为什么当存在非ASCII字符时文本无法正确解码的原因.
The URL in question is, in fact, sending a response Content-Type
header that is set to application/json
without specifying a charset
at all. The default charset for application/json
is UTF-8, but Indy does not know that, so it ends up using its own internal default instead, which is not UTF-8. That is why the text is not decoding properly when non-ASCII characters are present.
在这种情况下,如果您知道字符集将始终为UTF-8,则可以从以下几种解决方法中进行选择:
In which case, if you KNOW the charset will always be UTF-8, you have a few workarounds to choose from:
-
您可以通过在
IdGlobal
单位中设置全局GIdDefaultTextEncoding
变量来将Indy的默认字符集设置为UTF-8:
you can set Indy's default charset to UTF-8 by setting the global
GIdDefaultTextEncoding
variable in theIdGlobal
unit:
GIdDefaultTextEncoding := encUTF8;
,如果空白或不正确,则可以使用TIdHTTP.OnHeadersAvailable
事件将TIdHTTP.Response.Charset
属性更改为'utf-8'
.
you can use the TIdHTTP.OnHeadersAvailable
event to change the TIdHTTP.Response.Charset
property to 'utf-8'
if it is blank or incorrect.
Web.OnHeadersAvailable := CheckResponseCharset;
...
procedure TMyClass.CheckResponseCharset(Sender: TObject; AHeaders: TIdHeaderList; var VContinue: Boolean);
var
Response: TIdHTTPResponse;
begin
Response := TIdHTTP(Sender).Response;
if IsHeaderMediaType(Response.ContentType, 'application/json') and (Response.Charset = '') then
Response.Charset := 'utf-8';
VContinue := True;
end;
,您可以使用TIdHTTP.Get()
的另一个重载版本来填充输出TStream
,而不是返回string
.使用TMemoryStream
或TStringStream
,您可以使用UTF-8自己解码原始字节:
you can use the other overloaded version of TIdHTTP.Get()
that fills an output TStream
instead of returning a string
. Using a TMemoryStream
or TStringStream
, you can decode the raw bytes yourself using UTF-8:
MStrm := TMemoryStream.Create;
try
Web.Get(Url, MStrm);
MStrm.Position := 0;
Sito := ReadStringFromStream(MStrm, IndyTextEncoding_UTF8);
finally
SStrm.Free;
end;
SStrm := TStringStream.Create('', TEncoding.UTF8);
try
Web.Get(Url, SStrm);
Sito := SStrm.DataString;
finally
SStrm.Free;
end;
这篇关于德尔福印地utf8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!