VB6:我不明白为什么这段代码有效 [英] VB6: I Can't Figure Out Why This Code Works

查看:30
本文介绍了VB6:我不明白为什么这段代码有效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于这个愚蠢的问题,我深表歉意.我正在维护旧的遗留 VB6 代码,我有一个实际工作的函数 - 但我根本无法弄清楚它为什么工作,或者为什么没有它代码就无法工作.

I apologize for this silly question. I am maintaining old legacy VB6 code, and I have a function that actually works - but I simply can't figure out why it works, or why the code doesn't work without it.

基本上,此函数读取 UTF-8 文本文件并在 DHTMLEdit 组件中显示其内容.它的处理方式是将整个文件读入一个字符串,然后使用 ANSI 代码页将其从双字节转换为多字节字符串,然后再将其转换回双字节.

Basically, this function reads a UTF-8 text file and displays its contents in a DHTMLEdit component. The way it goes about it, is that it reads the entire file into a string, then converts it from a double byte to a multibyte string using the ANSI codepage, then converts it back to double byte.

使用整个精心设计的机制可以使组件正确显示同时包含希伯来语、阿拉伯语、泰语和中文的页面.不使用此代码会使文本看起来像是已转换为 ASCII,在字母曾经所在的位置显示各种标点符号.

Using this entire elaborate mechanism causes the component to correctly display a page that has Hebrew, Arabic, Thai and Chinese, all at the same time. Not using this code makes the text look like it was converted down to ASCII, showing various punctuation marks where letters once were.

我不明白的是:

  1. 既然原始文件是 UTF-8 而 VB6 字符串是 UTF-16,为什么还需要这个?为什么 VB6 不能在没有所有这些转换的情况下从文件中正确读取字符串?
  2. 如果该函数使用 CodePage = 0 (ANSI) 从宽字节转换为多字节,那不会消除当前代码页不支持的任何字符吗?我什至没有在这个站安装中文、泰文和阿拉伯文.然而,这是让 DHTMLEdit 控件正确显示的唯一方法.

[代码]

Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal codePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpDefaultChar As Long, lpUsedDefaultChar As Long) As Long
Private Declare Function GetACP Lib "kernel32" () As Long


...
Open filePath For Input As #lFilePtr
Dim sInput    as String
dim sResult   as string

Do While Not EOF(lFilePtr)
    Line Input #lFilePtr, sInput
    sResult = sResult + sInput;
Loop
txtBody.DOM.Body.innerText = DecodeString(sResult, CP_UTF8);

Public Function DecodeString(ByVal strSource As String, Optional FromCodePage As Long = -1) As String
    Dim strTemp As String

    If strSource = vbNullString Then Exit Function
    strTemp = UnicodeToAnsi(strSource, 0)
    DecodeString = AnsiToUnicode(strTemp, FromCodePage)
End Function

Public Function AnsiToUnicode(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String
    Dim strBuffer As String
    Dim cwch As Long
    Dim pwz As Long
    Dim pwzBuffer As Long

    If codePage = -1 Then codePage = GetACP()
    pwz = StrPtr(strSource)
    cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, 0&, 0&)
    strBuffer = String$(cwch + 1, vbNullChar)
    pwzBuffer = StrPtr(strBuffer)
    cwch = MultiByteToWideChar(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer))
    AnsiToUnicode = Left(strBuffer, cwch - 1)
End Function

Public Function UnicodeToAnsi(ByVal strSource As String, Optional ByVal codePage As Long = -1, Optional lFlags As Long = 0) As String
    Dim strBuffer As String
    Dim cwch As Long
    Dim pwz As Long
    Dim pwzBuffer As Long

    If codePage = -1 Then codePage = GetACP()
    pwz = StrPtr(strSource)
    cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, 0&, 0&, ByVal 0&, ByVal 0&)
    strBuffer = String$(cwch + 1, vbNullChar)
    pwzBuffer = StrPtr(strBuffer)
    cwch = WideCharToMultiByte(codePage, lFlags, pwz, -1, pwzBuffer, Len(strBuffer), ByVal 0&, ByVal 0&)
    UnicodeToAnsi = Left(strBuffer, cwch - 1)
End Function

[代码]

推荐答案

VB6/A 在使用内置运算符读取/写入文件时使用隐式双向 UTF16-ASCII 转换.

VB6/A uses implicit two-way UTF16-ASCII translation when reading / writing files using built-in operators.

Line Input 将文件视为 ASCII(一系列字节,每个字节代表一个字符),使用非 Unicode 程序的当前系统代码页.读取的字符转换为 UTF-16.

Line Input treats the file as being in ASCII (a series of bytes, each represents a character), using the current system codepage for non-Unicode programs. The read characters are converted to UTF-16.

当你以这种方式读取一个 UTF-8 文件时,你得到的是一个无效"的字符串——你不能直接在语言中使用它(如果你尝试你会看到垃圾),但它包含可用的二进制文件数据.

When you read a UTF-8 file in this way, what you get is an "invalid" string - you can't use it directly in the language (if you try you will see garbage), but it contains usable binary data.

然后将指向该可用二进制数据的指针传递给 WideCharToMultiByte(在 UnicodeToAnsi 中),这会导致创建另一个无效"字符串 - 这次它包含"ASCII"数据.这有效地恢复了 VB 使用 Line Input 自动执行的转换,并且由于原始文件是 UTF-8,您现在有一个包含 UTF-8 数据的无效"字符串,尽管转换函数认为它正在转换为 ASCII.

Then the pointer to that usable binary data is passed to WideCharToMultiByte (in UnicodeToAnsi), which results in another "invalid" string being created - this time it contains "ASCII" data. Effectively this reverts the conversion VB does automatically with Line Input, and because the original file was in UTF-8, you now have an "invalid" string with UTF-8 data in it, although the conversion function thought it was converting to ASCII.

将指向第二个无效字符串的指针传递给 MultiByteToWideChar(在 AnsiToUnicode 中),最终创建可在 VB 中使用的有效字符串.

The pointer to that second invalid string is passed to MultiByteToWideChar (in AnsiToUnicode) that finally creates a valid string that can be used in VB.

这段代码令人困惑的部分是 string 用于包含无效"数据.从逻辑上讲,所有这些都应该是字节数组.我会重构代码以二进制模式从文件中读取字节并将数组直接传递给 MultiByteToWideChar.

The confusing part about this code is that strings are used to contain the "invalid" data. Logically all these should have been arrays of bytes. I would refactor the code to read bytes from the file in the binary mode and pass the array to MultiByteToWideChar directly.

这篇关于VB6:我不明白为什么这段代码有效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆