VBA 使用 UTF-16 输出到文件 [英] VBA Output to file using UTF-16

查看:71
本文介绍了VBA 使用 UTF-16 输出到文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常复杂的问题,很难正确解释.互联网上有很多关于这个的讨论,但没有一个明确的.任何帮助,或比我更好的解释,非常感谢.

I have a very complex problem that is difficult to explain properly. There is LOTS of discussion about this across the internet, but nothing definitive. Any help, or better explanation than mine, is greatly appreciated.

本质上,我只是尝试使用 UTF-16 和 VBA 编写一个 XML 文件.

Essentially, I'm just trying to write an XML file using UTF-16 with VBA.

如果我这样做:

sXML = "<?xml version='1.0' encoding='utf-8'?>"
sXML = sXML & rest_of_xml_document
Print #iFile, sXML

然后我得到一个有效的 XML 文件.但是,如果我将encoding="更改为utf-16",我会从我的 XML 验证器中收到此错误:

then I get a file that is valid XML. However, if I change the "encoding=" to "utf-16", I get this error from my XML validator:

不支持从当前编码切换到指定编码.

谷歌搜索告诉我这意味着 xml 编码属性与文件使用的实际编码不同,因此我必须通过打开和打印命令创建一个 utf-8 文档.

Googling tells me that this means the xml encoding attribute is different to the ACTUAL encoding used by the file, hence I must be creating a utf-8 document via Open and Print commands.

如果我这样做:

With CreateObject("ADODB.Stream")
  .Type = 2
  .Charset = "utf-16"
  .Open
  .WriteText sXML
  .SaveToFile sFilename, 2
  .Close
End With

然后我在我的文件开头添加了一些时髦的字符(BOM),这导致它无法通过 XML 验证.

then I end up with some funky characters (the BOM) at the beginning of my file which causes it to fail XML validation.

如果我在 Notepad++ 中打开文件,删除 BOM 并将编码更改为UCS-2",则文件使用utf-16"编码值验证正常(意味着 UCS-2 足够接近 UTF-16 没关系,或者 XML 能够在这两种类型之间从当前编码切换.

If I open the file in Notepad++, delete the BOM and change the Encoding to "UCS-2", then the file validates fine with a "utf-16" encoding value (meaning that UCS-2 is close enough to UTF-16 that it doesnt matter, or that XML is able to Switch from current encoding between these two types.

我需要使用 UTF-16,因为 UTF-8 没有涵盖我导出的演示文稿中使用的所有字符.

I need to use UTF-16 because UTF-8 doesn't cover all the characters used in the presentations I'm exporting.

问题:

如何让 VBA 像 Notepad++ 一样运行,创建一个没有 BOM 的 UTF-16 编码文本文件,可以用 XML 数据填充?非常感谢任何帮助!

How can I get VBA to behave like Notepad++, creating a UTF-16-encoded text file without a BOM that can be filled with XML data? ANY help much appreciated!

推荐答案

您关于 UTF-8 无法存储您需要的所有字符的观点是无效的.
UTF-8 能够存储 Unicode 标准中定义的每个字符.
唯一的区别是,对于某些语言的文本,UTF-8 可以比 UTF-16 占用更多空间来存储其代码点.反之亦然:对于某些其他语言,例如英语,使用 UTF-8 节省空间.

Your point about UTF-8 not being able to store all characters you need is invalid.
UTF-8 is able to store every character defined in the Unicode standard.
The only difference is that, for text in certain languages, UTF-8 can take more space to store its codepoints than, say, UTF-16. The opposite is also true: for certain other languages, such as English, using UTF-8 saves space.

VB6 和 VBA,虽然在内存中以 Unicode 存储字符串,但在执行文件 IO 时隐式切换到 ANSI(使用当前系统代码页).您得到的结果文件不是 UTF-8.它位于您当前的系统代码页中,正如您可以在 这篇有用的文章中发现的那样,如果你来自美国,看起来就像 UTF-8.

VB6 and VBA, although store strings in memory in Unicode, implicitly switch to ANSI (using the current system code page) when doing file IO. The resulting file you get is NOT in UTF-8. It is in your current system codepage, which, as you can discover in this helpful article, looks just like UTF-8 if you're from USA.

试试:

Dim s As String
s = "<?xml version='1.0' encoding='utf-16'?>"
s = s & ChrW$(&H43F&) & ChrW$(&H440&) & ChrW$(&H43E&) & ChrW$(&H432&) & ChrW$(&H435&) & ChrW$(&H440&) & ChrW$(&H43A&) & ChrW$(&H430&)

Dim b() As Byte
b = s

Open "Unicode.txt" For Binary Access Write As #1
Put #1, , b
Close #1

<小时>

如果你绝对必须有 UTF-8,你可以自己做一些:


And if you absolutely must have UTF-8, you can make yourself some:

Option Explicit

Private Declare Function WideCharToMultiByte Lib "kernel32.dll" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByRef lpMultiByteStr As Byte, ByVal cchMultiByte As Long, ByVal lpDefaultChar As String, ByRef lpUsedDefaultChar As Long) As Long

Private Const CP_UTF8 As Long = 65001
Private Const ERROR_INSUFFICIENT_BUFFER As Long = 122&


Public Function ToUTF8(s As String) As Byte()

  If Len(s) = 0 Then Exit Function


  Dim ccb As Long
  ccb = WideCharToMultiByte(CP_UTF8, 0, StrPtr(s), Len(s), ByVal 0&, 0, vbNullString, ByVal 0&)

  If ccb = 0 Then
    Err.Raise 5, , "Internal error."
  End If

  Dim b() As Byte
  ReDim b(1 To ccb)

  If WideCharToMultiByte(CP_UTF8, 0, StrPtr(s), Len(s), b(LBound(b)), ccb, vbNullString, ByVal 0&) = 0 Then
    Err.Raise 5, , "Internal error."
  Else
    ToUTF8 = b
  End If

End Function

Sub Test()
  Dim s As String
  s = "<?xml version='1.0' encoding='utf-8'?>"
  s = s & ChrW$(&H43F&) & ChrW$(&H440&) & ChrW$(&H43E&) & ChrW$(&H432&) & ChrW$(&H435&) & ChrW$(&H440&) & ChrW$(&H43A&) & ChrW$(&H430&)

  Dim b() As Byte
  b = ToUTF8(s)

  Open "utf-8.txt" For Binary Access Write As #1
  Put #1, , b
  Close #1
End Sub

这篇关于VBA 使用 UTF-16 输出到文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆