UCS-2 Little Endian到UTF-8的转换使文件中包含许多不需要的字符 [英] UCS-2 Little Endian to UTF-8 conversion leaves file with many unwanted characters
问题描述
我有一个脚本,在介绍了我可以在VBScript中使用ADODB进行编码转换的许多不同方式之后,我将它们放在一起.
I have a script that I put together after going over many different ways that I could do an encoding conversion using ADODB in VBScript.
Option Explicit
Sub UTFConvert()
Dim objFSO, objStream, file
file = "FileToConvert.csv"
Set objStream = CreateObject( "ADODB.Stream" )
objStream.Open
objStream.Type = 2
objStream.Position = 0
objStream.Charset = "utf-8"
objStream.LoadFromFile file
objStream.SaveToFile file, 2
objStream.Close
Set objStream = Nothing
End Sub
UTFConvert
应该将文件从UCS-2 Little Endian或其格式(在限制内)以任何可读格式转换为UTF-8.但是,问题在于,一旦该文件完成了转换为UTF-8的转换,每个字母前后的整个文件中都会有许多NUL
符号,而在开始时则是xFF
xFE
(UCS-2 LE BOM)文件.这些可见,而无需使用任何符号可视化切换.我们将不胜感激,帮助您了解我在此转换方面的局限性.或我可以采用的其他替代方法.
The file is supposed to be converted from UCS-2 Little Endian, or whichever readable format it is in (within limitations), to UTF-8. The issue however is that once this file has finished converting to UTF-8 there are many NUL
symbols throughout the entire file before and after every letter, and xFF
xFE
(UCS-2 LE BOM) at the start of the file. These are visible without needing to use any symbol visualization toggles. Any help would be appreciated in understanding where I may be limited with this conversion. Or any alternative approach I can take.
推荐答案
您的Stream
对象将文件作为UTF-8编码的文件加载,从而错误解释了字节序列.使用FileSystemObject
实例读取文件,然后使用ADODB.Stream
对象写入文件:
Your Stream
object is loading the file as an UTF-8 encoded file, thus misinterpreting the byte sequences. Read the file using a FileSystemObject
instance and write it with the ADODB.Stream
object:
Sub UTFConvert(filename)
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile(filename, 1, False, -1).ReadAll
Set stream = CreateObject("ADODB.Stream")
stream.Open
stream.Type = 2 'text
stream.Position = 0
stream.Charset = "utf-8"
stream.WriteText txt
stream.SaveToFile filename, 2
stream.Close
End Sub
这篇关于UCS-2 Little Endian到UTF-8的转换使文件中包含许多不需要的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!