如何使用 nodejs-iconv 模块(或其他解决方案)在 nodejs javascript 中将字符编码从 CP932 转换为 UTF-8 [英] How to convert character encoding from CP932 to UTF-8 in nodejs javascript, using the nodejs-iconv module (or other solution)

查看:60
本文介绍了如何使用 nodejs-iconv 模块(或其他解决方案)在 nodejs javascript 中将字符编码从 CP932 转换为 UTF-8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 javascript 中将字符串从 CP932(又名 Windows-31J)转换为 utf8.基本上,我正在抓取一个忽略请求标头中的 utf-8 请求并返回 cp932 编码文本的站点(即使 html 元标记指示页面是 shift_jis).

无论如何,我将整个页面存储在名为html"的字符串变量中.从那里我尝试使用以下代码将其转换为 utf8:

var Iconv = require('iconv').Iconv;var conv = new Iconv('CP932', 'UTF-8//TRANSLIT//IGNORE');var myBuffer = new Buffer(html.length * 3);myBuffer.write(html, 0, 'utf8')var utf8html = (conv.convert(myBuffer)).toString('utf8');

结果不是它应该的那样.例如,字符串:投稿者さんの稚内面日コホテルのクチホテルのクチホテルのクチホテルコホチテココチホテキタ袈裟舫

如果我删除//TRANSLIT//IGNORE(这应该会导致它为丢失的字符返回类似的字符,如果失败则忽略不可转码的字符),我会收到此错误:错误:EILSEQ,非法字符序列.

我愿意使用任何可以在 nodejs 中实现的解决方案,但我的搜索结果并没有在 nodejs-iconv 模块之外产生很多选项.

nodejs-iconv 参考:https://github.com/bnoordhuis/node-iconv

谢谢!

编辑 24.06.2011:我已经在 J​​ava 中实现了一个解决方案.但是,如果有人可以解决这个问题,我仍然会对这个问题的 javascript 解决方案感兴趣.

解决方案

我今天也遇到了同样的问题 :)
这取决于 libiconv.你需要 libiconv-1.13-ja-1.patch.
请检查以下内容.

或者你可以使用 iconv-jp try 来避免问题

npm install iconv-jp

I'm attempting to convert a string from CP932 (aka Windows-31J) to utf8 in javascript. Basically I'm crawling a site that ignores the utf-8 request in the request header and returns cp932 encoded text (even though the html metatag indicates that the page is shift_jis).

Anyway, I have the entire page stored in a string variable called "html". From there I'm attempting to convert it to utf8 using this code:

var Iconv = require('iconv').Iconv;
var conv = new Iconv('CP932', 'UTF-8//TRANSLIT//IGNORE');

var myBuffer = new Buffer(html.length * 3);
myBuffer.write(html, 0, 'utf8')
var utf8html = (conv.convert(myBuffer)).toString('utf8');

The result is not what it's supposed to be. For example, the string: "投稿者さんの 稚内全日空ホテル のクチコミ (感想・情報)" comes out as "ソスソスソスeソスメゑソスソスソスソスソス ソスtソスソスソスSソスソスソスソスソスzソスeソスソス ソスフクソス`ソスRソス~ (ソスソスソスzソスEソスソスソスソス)"

If I remove //TRANSLIT//IGNORE (Which should cause it to return similar characters for missing characters, and failing that omit non-transcode-able characters), I get this error: Error: EILSEQ, Illegal character sequence.

I'm open to using any solution that can be implemented in nodejs, but my search results haven't yielded many options outside of the nodejs-iconv module.

nodejs-iconv ref: https://github.com/bnoordhuis/node-iconv

Thanks!

Edit 24.06.2011: I've gone ahead and implemented a solution in Java. However I'd still be interested in a javascript solution to this problem if somebody can solve it.

解决方案

I got same trouble today :)
It depends libiconv. You need libiconv-1.13-ja-1.patch.
Please check followings.

or you can avoid problem using iconv-jp try

npm install iconv-jp

这篇关于如何使用 nodejs-iconv 模块(或其他解决方案)在 nodejs javascript 中将字符编码从 CP932 转换为 UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆