HTTP响应主体的Node.js Unicode问题 [英] Node.js unicode issue with HTTP response body

查看:133
本文介绍了HTTP响应主体的Node.js Unicode问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用本机"http"模块的HTTP请求的响应主体显示Unicode字符的问号字符,而不是其实际值.这是我正在运行的基本代码段.

The response body of HTTP requests using the native 'http' module, displays question mark characters for unicode chars, instead of their actual value. Here's the basic snippet of code that I'm running.

var http = require('http');
var google = http.createClient(80, 'www.google.it');
var request = google.request('GET', '/',
{
 'host': 'www.google.it',
}
  );
request.end();
request.on('response', function (response) {
  response.setEncoding('utf8');
  response.on('data', function (chunk) {
    console.log(chunk);
  });
});

在回复中,有一个特定的词以"公开"开头.它的最后一个字母是一个奇怪的字符,向我显示为问号.该单词应为Pubblicità,而不是显示为 Pubblicit?.

In the response there's a specific word that starts with "Pubblicit". Its last letter is a weird character that shows as a question mark to me. The word should be Pubblicità, instead it is displyed as Pubblicit?.

我也尝试过使用.toString()输出数据:

I have also tried outputting the data using .toString():

console.log(chunk.toString());

console.log(chunk.toString('utf8'));

但是我得到了相同的结果.

But I'm getting the same results.

有什么主意吗?

推荐答案

原因可能是,如果我们未在请求标头上指定"googleKnownAsUTF8OK"用户代理,则Google会响应内容类型为ISO- 8859-1(对于旧的浏览器,机器人?我不知道),因此用二进制"解码响应缓冲区是正确的.

Reason maybe that, if we do not specify a "googleKnownAsUTF8OK" user-agent on request header, google would response a html doc with content-type of ISO-8859-1(for old browsers,bots?i dont know), so decode the response buffer by "binary" is correct.

但是,如果我们解码utf8以ISO-8859-1编码的缓冲区,则字节0xe0(à)表示连续3个字节组成一个字符",在我们的情况下,它是格式错误的字符,因此很少显示意外字符(取决于环境).

But, if we decode a buffer encoded in ISO-8859-1 by utf8, then the byte 0xe0(à) implies "form a character by 3bytes in a row", it is a malformed character in our case, so a few unexpected characters(depending on the environment) was displayed.

我们可以尝试将"Mozilla/5.0"作为用户代理的值.祝你好运.

We may try "Mozilla/5.0" as value of user-agent. Good luck.

这篇关于HTTP响应主体的Node.js Unicode问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆