UCS-2 Little Endian到UTF-8的转换使文件中包含许多不需要的字符 [英] UCS-2 Little Endian to UTF-8 conversion leaves file with many unwanted characters

查看:803
本文介绍了UCS-2 Little Endian到UTF-8的转换使文件中包含许多不需要的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,在介绍了我可以在VBScript中使用ADODB进行编码转换的许多不同方式之后,我将它们放在一起.

I have a script that I put together after going over many different ways that I could do an encoding conversion using ADODB in VBScript.

Option Explicit

Sub UTFConvert()
    Dim objFSO, objStream, file

    file = "FileToConvert.csv"

    Set objStream = CreateObject( "ADODB.Stream" )
    objStream.Open
    objStream.Type = 2
    objStream.Position = 0
    objStream.Charset = "utf-8"
    objStream.LoadFromFile file
    objStream.SaveToFile file, 2
    objStream.Close
    Set objStream = Nothing
End Sub

UTFConvert

应该将文件从UCS-2 Little Endian或其格式(在限制内)以任何可读格式转换为UTF-8.但是,问题在于,一旦该文件完成了转换为UTF-8的转换,每个字母前后的整个文件中都会有许多NUL符号,而在开始时则是xFF xFE(UCS-2 LE BOM)文件.这些可见,而无需使用任何符号可视化切换.我们将不胜感激,帮助您了解我在此转换方面的局限性.或我可以采用的其他替代方法.

The file is supposed to be converted from UCS-2 Little Endian, or whichever readable format it is in (within limitations), to UTF-8. The issue however is that once this file has finished converting to UTF-8 there are many NUL symbols throughout the entire file before and after every letter, and xFF xFE (UCS-2 LE BOM) at the start of the file. These are visible without needing to use any symbol visualization toggles. Any help would be appreciated in understanding where I may be limited with this conversion. Or any alternative approach I can take.

推荐答案

您的Stream对象将文件作为UTF-8编码的文件加载,从而错误解释了字节序列.使用FileSystemObject实例读取文件,然后使用ADODB.Stream对象写入文件:

Your Stream object is loading the file as an UTF-8 encoded file, thus misinterpreting the byte sequences. Read the file using a FileSystemObject instance and write it with the ADODB.Stream object:

Sub UTFConvert(filename)
  Set fso = CreateObject("Scripting.FileSystemObject")
  txt = fso.OpenTextFile(filename, 1, False, -1).ReadAll

  Set stream = CreateObject("ADODB.Stream")
  stream.Open
  stream.Type     = 2 'text
  stream.Position = 0
  stream.Charset  = "utf-8"
  stream.WriteText txt
  stream.SaveToFile filename, 2
  stream.Close
End Sub

这篇关于UCS-2 Little Endian到UTF-8的转换使文件中包含许多不需要的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆