http.get和ISO-8859-1编码的响应 [英] http.get and ISO-8859-1 encoded responses

查看:754
本文介绍了http.get和ISO-8859-1编码的响应的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要写一个RSS提要抓取程序,并且遇到一些字符集问题。

I'm about to write a RSS-feed fetcher and stuck with some charset problems.

加载和解析feed与编码相比相当容易。
我正在加载与 http.get 的feed,我把块放在每个数据事件。
稍后我使用npm-lib feedparser 解析整个字符串,它对给定的字符串起作用。

Loading and parsing the feed was quite easy compared to the encoding. I'm loading the feed with http.get and I'm putting the chunks together on every data event. Later I'm parsing the whole string with the npm-lib feedparser which works fine with the given string.

很遗憾,我习惯了像 utf8_encode()在php中的功能,我在node.js中缺少它们,所以我坚持使用Iconv,这是

Sadly I'm used to functions like utf8_encode() in php and I'm missing them in node.js so I'm stuck with using Iconv which is currently not doing what I want.

没有编码有几个utf8? - 错误的字符集,iconv,字符串被解析错误:/

Without encoding there are several utf8 ?-icons for wrong charset, with iconv, the string is parsed wrong :/

目前我对每个字符串分别进行编码:

Currently I'm encoding every string seperatedly:

//var encoding ≈ ISO-8859-1 etc. (Is the right one, checked with docs etc.)
// Shortend version

var iconv = new Iconv(encoding, 'UTF-8');

parser.on('article', function(article){
    var object = {
        title : iconv.convert(article.title).toString('UTF-8'),
        description : iconv.convert(article.summary).toString('UTF-8')
    }
    Articles.push(object);
});

我应该开始使用数据缓冲区编码还是稍后使用完整的字符串?

Should I start encoding with data-buffers or later with the complete string?

谢谢!

PS:编码是通过解析xml头来确定的

PS: Encoding is determined with parsing the head of xml

在node.js中编码的模块如何更容易?

How about a module which makes encoding in node.js easier?

推荐答案

https://groups.google.com/group/nodejs/browse_thread/thread/b2603afa31aada9c

解决方案似乎是在使用Iconv处理缓冲区之前将响应编码设置为二进制。

The solution seems to be to set the response encoding to binary before processing the Buffer with Iconv.

相关位是


设置response.setEncoding('binary')并在调用Iconv.convert()之前将这些块聚合到一个缓冲区。请注意,encoding = binary表示您的数据回调将接收Buffer对象,而不是字符串。

set response.setEncoding('binary') and aggregate the chunks into a buffer before calling Iconv.convert(). Note that encoding=binary means your data callback will receive Buffer objects, not strings.






更新这是我的初始回复

您确定您接收的Feed已经正确编码?

Are you sure that the feed you are receiving has been encoded correctly?

我可以看到两个可能的错误:

I can see two possible errors:


  1. ,但使用 Content-Type 表示 charset = UTF-8 。 li>
  2. 该Feed使用UTF-8编码数据发送,但 Content-Type 头未声明任何内容,默认为ASCII。 / li>
  1. the feed is being sent with Latin-1-encoded data, but with a Content-Type that states charset=UTF-8.
  2. the feed is being sent with UTF-8-encoded data but the Content-Type header does not state anything, defaulting to ASCII.

您应该使用Wireshark或cURL等实用工具检查Feed的内容和已发送的标题。

You should check the content of your feed and the sent headers with some utility like Wireshark or cURL.

这篇关于http.get和ISO-8859-1编码的响应的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆