怀疑 - 请帮助我 [英] Doubt - please help me

查看:112
本文介绍了怀疑 - 请帮助我的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嗨朋友们,

i有一个文件,里面有日语字符串我的代码是做什么的:



从ansi读取并转换unicode,但我硬编码缓冲区为1024 ..这就是它给出的问题..所以我认为得到逐行计数并分配内存....对于那个char和tchar ..但不是wrking ..



这里是代码

  
{
Lengith = GetFileContent()+ 1 ;

HANDLE hFileMac = m_pFile;

char * szMBuf = new char [Lengith];
memset(szMBuf, 0 ,Lengith + 1 );
TCHAR * cszMBuf = new TCHAR [Lengith];
memset(cszMBuf, 0 ,Lengith + 1 );


if ((lBytes = :: PReadFromFile(hFileMac,szMBuf,Lengith,TRUE))< 0L)
{
:: CloseHandle(hFileMac);
// 返回SetMacroError(NMLANG_MacExcErr,szMBuf);
}

str1 = szMBuf;
int nLen = MultiByteToWideChar( 932 0 ,str1,-1,NULL,NULL);
int i = MultiByteToWideChar( 932 0 ,str1,-1,cszMBuf,nLen);

if (nBOM == 0 ){arcOut.Write(&) bom, 2 ); }

arcOut.WriteString(cszMBuf);


memset(szMBuf, 0 ,lBytes + 1 ) ;


} while (lBytes == Lengith);

解决方案

代码中有多个错误,可能有多种错误原因:

1.从多字节流中读取任意数量的字节并不能保证你不会中断多字节序列。

2.虽然假设unicode缓冲区不能超过多字节缓冲区的字节数是正确的,但初始化为0是错误的。使用memset()用0替换字符(每个= 1字节)的数量,而宽字符= 2字节。所以一半的缓冲区没有清理干净。在这里你需要使用wmemset()代替。

3.正如理查德所说你的缓冲区很小,如果你想要整个字符串+结束null分配你想要的数字加上1结束null!

4.由于第1点的原因,从磁盘中选取的部分多字节序列可以使用结束null作为多字节的一部分,在这种情况下,解析将在内存中继续达到一致的多字节序列,或者停止不将结尾null分配给unicode字符串。



除非你可以预先评估多字节序列,否则最好阅读和转换它一次通过。

分配正确大小的缓冲区并正确初始化。


如果大小超过10000字节则会出现问题

Hi friends,
i have a file which is having Japanese string what my code is doing is:

read from ansi and convert to unicode , but i hard code the buffer as 1024 .. with this its giving problem .. so what i thought get the line by line count and allocate memory .... for that char and tchar.. but not wrking..

here is the code

do
    {
        Lengith = GetFileContent() + 1;

        HANDLE hFileMac = m_pFile;

        char*szMBuf = new char[Lengith];
        memset(szMBuf, 0, Lengith + 1);
        TCHAR*cszMBuf = new TCHAR[Lengith];
        memset(cszMBuf, 0, Lengith + 1);


        if ((lBytes = ::PReadFromFile(hFileMac, szMBuf, Lengith, TRUE)) < 0L)
        {
            ::CloseHandle(hFileMac);
            //  return SetMacroError(NMLANG_MacExcErr, szMBuf);
        }

        str1 = szMBuf;
        int nLen = MultiByteToWideChar(932, 0, str1, -1, NULL, NULL);
        int i = MultiByteToWideChar(932, 0, str1, -1, cszMBuf, nLen);

        if (nBOM == 0) { arcOut.Write(&bom, 2); }

        arcOut.WriteString(cszMBuf);


        memset(szMBuf, 0, lBytes + 1);


    } while (lBytes == Lengith);

解决方案

There are more than one error in your code, and many possibly failure reasons:
1. Reading an arbitrary number of bytes from a multibyte stream does not guarantee that you aren't interrupting a multibyte sequence.
2. While the assumption that the unicode buffer can't be longer that the number of bytes of multibyte buffer is correct, the initialization to 0 is wrong. You use memset() that replaces the number of chars (each=1 byte) with 0, while a wide char = 2 bytes. So half your buffer is not cleaned. Here you hhave to use wmemset() instead.
3. As Richard said your buffers are small, if you want to have whole string in it + the ending null allocate the numer of tchars you want plus 1 for the ending null!
4. for the reason at point 1, a partial multibyte sequence picked from disk can use the ending null as part of the multibyte, in this case the parsing will go on in memory up to a congruent multibyte sequence, or would stop not assigning the ending null to the unicode string.

Unless you can preevaluate the multibyte sequence it would be better to read and convert it in one pass.
Allocate buffers with correct size and correct initialization.


if the size is more than 10000 bytes then it will be problem


这篇关于怀疑 - 请帮助我的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆