Powerbuilder:UTF-8 的导入文件(将 UTF-8 转换为 ANSI) [英] Powerbuilder: ImportFile of UTF-8 (Converting UTF-8 to ANSI)

查看:239
本文介绍了Powerbuilder:UTF-8 的导入文件(将 UTF-8 转换为 ANSI)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Powerbuilder 版本是 6.5,不能使用更高版本,因为这是我所支持的.

My Powerbuilder version is 6.5, cannot use a higher version as this is what I am supporting.

我的问题是,当我执行 dw_1.ImportFile(file) 时,第一行和第一列有一个像这样的有趣字符串:

My problem is, when I am doing dw_1.ImportFile(file) the first row and first column has a funny string like this:



直到我尝试打开文件并将其保存到一个新的文本文件并尝试导入该新文件后,我才明白这一点.它在没有有趣字符串的情况下完美运行.

Which I dont understand until I tried opening the file and saving it to a new text file and trying to import that new file.which worked flawlessly without the funny string.

我的结论是,发生这种情况是因为文件是 UTF-8(如 NOTEPAD++ 中所示),而新文件是 Ansi.我尝试导入的文件是由第 3 方自动提供的,我的用户不想要额外的工作.

My conclusion is that this is happening because the file is UTF-8 (as shown in NOTEPAD++) and the new file is Ansi. The file I am trying to import is automatically given by a 3rd party and my users dont want the extra job of doing this.

如何在 powerbuilder 中强制将此文件转换为 ANSI.如果没有,我可能需要进行命令提示符转换,有什么想法吗?

How do I force convert this files to ANSI in powerbuilder. If there is none, I might have to do a command prompt conversion, any ideas?

推荐答案

奇怪的  字符是(可选)utf-8 BOM,它告诉编辑器文件是 utf-8 编码(因为除非我们遇到代码 127 以上的转义字符,否则很难知道它).你不能把它去掉,因为如果你的文件包含 127 以上的任何字符(重音或任何特殊字符),你显示的数据中仍然会有垃圾(例如:é -> é, -> â‚, ...) 其中特殊字符将变成 2 到 4 个垃圾字符.

The weird  characters are the (optional) utf-8 BOM that tells editors that the file is utf-8 encoded (as it can be difficult to know it unless we encounter an escaped character above code 127). You cannot just rid it off because if your file contains any character above 127 (accents or any special char), you will still have garbage in your displayed data (for example: é -> é, -> €, ...) where special characters will become from 2 to 4 garbage chars.

我最近需要将一些 utf-8 编码的字符串转换为ansi"Windows 1252 编码.使用 PB10+ 版本,utf-8 和 ansi 之间的重新编码就像

I recently needed to convert some utf-8 encoded string to "ansi" windows 1252 encoding. With version of PB10+, a reencoding between utf-8 and ansi is as simple as

b = blob(s, encodingutf8!)
s2 = string(b, encodingansi!)

但是string()blob()不支持PB 10之前的编码规范.

But string() and blob() do not support encoding specification before the release 10 of PB.

你可以做的是自己读取文件,跳过BOM,让Windows通过MultiByteToWideChar() + WideCharToMultiByte()转换字符串编码并加载使用 ImportString() 在 DW 中转换字符串.

What you can do is to read the file yourself, skip the BOM, ask Windows to convert the string encoding via MultiByteToWideChar() + WideCharToMultiByte() and load the converted string in the DW with ImportString().

获取文件内容的概念证明(使用这种读取方式,文件不能大于2GB):

Proof of concept to get the file contents (with this reading method, the file cannot be bigger than 2GB):

string ls_path, ls_file, ls_chunk, ls_ansi
ls_path = sle_path.text
int li_file
if not fileexists(ls_path) then return

li_file = FileOpen(ls_path, streammode!)
if li_file > 0 then
    FileSeek(li_file, 3, FromBeginning!) //skip the utf-8 BOM

    //read the file by blocks, FileRead is limited to 32kB
    do while FileRead(li_file, ls_chunk) > 0
        ls_file += ls_chunk //concatenate in loop works but is not so performant
    loop

    FileClose(li_file)

    ls_ansi = utf8_to_ansi(ls_file)
    dw_tab.importstring( text!, ls_ansi)
end if

utf8_to_ansi() 是一个全局函数,它是为 PB9 编写的,但它应该与 PB6.5 相同:

utf8_to_ansi() is a globlal function, it was written for PB9, but it should work the same with PB6.5:

global type utf8_to_ansi from function_object
end type

type prototypes
function ulong MultiByteToWideChar(ulong CodePage, ulong dwflags, ref string lpmultibytestr, ulong cchmultibyte, ref blob lpwidecharstr, ulong cchwidechar) library "kernel32.dll"
function ulong WideCharToMultiByte(ulong CodePage, ulong dwFlags, ref blob lpWideCharStr, ulong cchWideChar, ref string lpMultiByteStr, ulong cbMultiByte, ref string lpUsedDefaultChar, ref boolean lpUsedDefaultChar) library "kernel32.dll"
end prototypes

forward prototypes
global function string utf8_to_ansi (string as_utf8)
end prototypes

global function string utf8_to_ansi (string as_utf8);

//convert utf-8 -> ansi
//use a wide-char native string as pivot

constant ulong CP_ACP = 0
constant ulong CP_UTF8 = 65001

string ls_wide, ls_ansi, ls_null
blob lbl_wide
ulong ul_len
boolean lb_flag

setnull(ls_null)
lb_flag = false

//get utf-8 string length converted as wide-char
setnull(lbl_wide)
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, 0)
//allocate buffer to let windows write into
ls_wide = space(ul_len * 2)
lbl_wide = blob(ls_wide)
//convert utf-8 -> wide char
ul_len = multibytetowidechar(CP_UTF8, 0, as_utf8, -1, lbl_wide, ul_len)
//get the final ansi string length
setnull(ls_ansi)
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, 0, ls_null, lb_flag)
//allocate buffer to let windows write into
ls_ansi = space(ul_len)
//convert wide-char -> ansi
ul_len = widechartomultibyte(CP_ACP, 0, lbl_wide, -1, ls_ansi, ul_len, ls_null, lb_flag)

return ls_ansi
end function

这篇关于Powerbuilder:UTF-8 的导入文件(将 UTF-8 转换为 ANSI)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆