在node.js应用程序中读取文件时出现奇怪的unicode字符 [英] Strange unicode characters when reading in file in node.js app

查看:134
本文介绍了在node.js应用程序中读取文件时出现奇怪的unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个节点应用程序,该应用程序读取一组文件,将它们拆分为几行,然后将这些行放入数组中.很简单除了我正在使用的某些SQL文件外,它还可以处理许多文件.出于某种原因,当我拆分行时,似乎会得到某种unicode输出.该应用看起来像这样:

I am attempting to write a node app that reads in a set of files, splits them into lines, and puts the lines into an array. Pretty simple. It works on quite a few files except some SQL files that I am working with. For some reason I seem to be getting some kind of unicode output when I split the lines up. The app looks something like this:

fs = require("fs");
var data = fs.readFileSync("test.sql", "utf8");
console.log(data);
lines = data.split("\n");
console.log(lines);

输入文件如下所示:

use whatever
go

输出看起来像这样:

��use whatever
go

[ '��u\u0000s\u0000e\u0000 \u0000w\u0000h\u0000a\u0000t\u0000e\u0000v\u0000e\u0000r\u0000',
  '\u0000g\u0000o\u0000',
  '\u0000' ]

如您所见,文件开头有某种无法识别的字符.读入数据并直接输出后,除此字符外看起来还可以.但是,如果我随后尝试将其拆分为几行,则会得到所有这些类似unicode的字符.基本上是所有实际字符,每个字符的开头都带有"\ u0000".

As you can see there is some kind of unrecognized character at the beginning of the file. After reading the data in and directly outputting it, it looks okay except for this character. However, if I then attempt to split it up into lines, I get all these unicode-like characters. Basically it's all the actual characters with "\u0000" at the beginning of each one.

我不知道这里发生了什么,但似乎与文件本身中的字符有关.如果我将文件的文本复制并粘贴到另一个新文件中,然后在新文件上运行该应用程序,则可以正常工作.我认为在复制和粘贴过程中会消除导致此问题的所有原因.

I have no idea what's going on here but it appears to have something to do with the characters in the file itself. If I copy and paste the text of the file into another new file and run the app on the new file, it works fine. I assume that whatever is causing this issue is being stripped out during the copy and paste process.

推荐答案

您的文件位于UTF-16 Little Big Endian中,而不是UTF-8.

Your file is in UTF-16 Little Big Endian, not UTF-8.

var data = fs.readFileSync("test.sql", "utf16le"); //Not sure if this eats the BOM


不幸的是,node.js仅支持UTF-16 Little Endian或UTF-16LE(无法确定通过阅读文档,它们之间存在细微差别;即UTF-16LE不使用BOM),因此使用 iconv 或以其他方式将文件转换为UTF-8.


Unfortunately node.js only supports UTF-16 Little Endian or UTF-16LE (Can't be sure from reading docs, there is a slight difference between them; namely that UTF-16LE does not use BOMs), so you have to use iconv or convert the file to UTF-8 some other way.

示例:

var Iconv  = require('iconv').Iconv,
    fs = require("fs");

var buffer = fs.readFileSync("test.sql"),
    iconv = new Iconv( "UTF-16", "UTF-8");

var result = iconv.convert(buffer).toString("utf8");

这篇关于在node.js应用程序中读取文件时出现奇怪的unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆