关于希伯来ansi到unicode的问题 [英] Question about hebrew ansi to unicode

查看:134
本文介绍了关于希伯来ansi到unicode的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个希伯来语ANSI文本文件,我应该转换为Unicode希伯来语(文件)转换已完成,但我无法按预期获得所需的输出。请让我知道怎么做。



我尝试过:



I have a Hebrew ANSI text file i should convert to Unicode Hebrew ( file ) conversion is done but iam not able to get the desired output as expected. please let me know how to do it.

What I have tried:

//code page
int nlanguageCodePage = this->GetCodepage(lpszOldFileName);

while (fgets(chAnsiBuff, NMLANG_MaxNBuf, pFile) != NULL)
{
    sUnicodeBuff = chAnsiBuff;

    //CONVERTING TO UNICODE
    nSize = MultiByteToWideChar(nlanguageCodePage, 0, sUnicodeBuff, -1, NULL, NULL);
    MultiByteToWideChar(nlanguageCodePage, 0, sUnicodeBuff, -1, chUniocodeBuff, nSize);

    // bom at starting
    if (nBOM == 0) { arcOut.Write(&bom, 2); }
    arcOut.WriteString(chUniocodeBuff);

    nBOM++;
}

推荐答案

您正在使用相同的缓冲区进行输入和输出。这是行不通的。请参阅 MultiByteToWideChar功能(Windows) [ ^ ]。



它应该是这样的:

You are using the same buffer for input and output. That won't work. See the MultiByteToWideChar function (Windows)[^].

It should be like this:
int nSize = MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, NULL, NULL);
LPWSTR sUnicodeBuf = new WCHAR[nSize];
MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, sUnicodeBuff, nSize);
// Use sUnicodeBuff here
delete [] sUniocodeBuff;



但是,当ANSI输入缓冲区具有固定大小时,它也可以用于输出缓冲区,因为Unicode字符串的字符数永远不会超过输入字符串中ANSI字符数:


However, when having a fixed size for the ANSI input buffer, it can be also used for the output buffer because the Unicode string will never have more wide characters than the number of ANSI characters in the input string:

WCHAR wUnicodeBuf[NMLANG_MaxNBuf];
while (fgets(chAnsiBuff, NMLANG_MaxNBuf, pFile) != NULL)
{
    MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, wUnicodeBuff, NMLANG_MaxNBuf);
 
    // bom at starting
    if (nBOM == 0) { arcOut.Write(&bom, 2); }
    arcOut.WriteString(wUnicodeBuff);
 
    nBOM++;
}



这应该有效。如果结果不符合预期,请检查其他相关函数,如 arcOut.WriteString(),如果BOM正确,并且您的输入文件是否真的使用代码编码page nlanguageCodePage





另一个可能的来源可能是 arcOut.WriteString()将Unicode字符串转换回ANSI时调用。然后,您可以使用二进制写入:


That should work. If the result is not as expected, check your other involved functions like arcOut.WriteString(), if the BOM is correct, and if your input file is really encoded with the code page nlanguageCodePage.


Another possible source may be the arcOut.WriteString() call when it converts the Unicode string back to ANSI. You may then use a binary write instead:

int len = MultiByteToWideChar(nlanguageCodePage, 0, chAnsiBuf, -1, wUnicodeBuff, NMLANG_MaxNBuf);

// bom at starting
if (nBOM == 0) { arcOut.Write(&bom, 2); }
if (len > 0)
    arcOut.Write(wUnicodeBuff, len * sizeof(WCHAR));

nBOM++;



[/ EDIT]


[/EDIT]


这篇关于关于希伯来ansi到unicode的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆