utfcpp和Win32宽API [英] utfcpp and Win32 wide API

查看:163
本文介绍了utfcpp和Win32宽API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用微小的 utfcpp 库可以转换我从使用utf16to8将有效的UTF8格式的Windows API(FindFirstFileW等)转换为有效的UTF8格式?



我想在内部使用UTF8,但是无法获得正确的输出经过另一次转换或简单cout)。



还是有更简单的替代方法?



谢谢!



更新:感谢汉斯(下面),我现在有一个通过Windows API轻松的UTF8 UTF16转换。双向转换工作,但UTF8从UTF16字符串有一些额外的字符,可能会导致我以后的一些麻烦...)。我会在这里分享纯粹的友好:)):

  // UTF16  - > UTF8转换
std :: string toUTF8(const std :: wstring& input)
{
//获取长度
int length = WideCharToMultiByte(CP_UTF8,NULL,
input.c_str(),input.size(),
NULL,0,
NULL,NULL);
if(!(length> 0))
return std :: string();
else
{
std :: string result;
result.resize(length);

if(WideCharToMultiByte(CP_UTF8,NULL,
input.c_str(),input.size(),
& result [0],result.size() b $ b NULL,NULL)> 0)
return result;
else
throw std :: runtime_error(无法执行toUTF8:转换失败。
}
}
// UTF8 - > UTF16转换
std :: wstring toUTF16(const std :: string& input)
{
//获取长度
int length = MultiByteToWideChar(CP_UTF8,NULL,
input.c_str(),input.size(),
NULL,0);
if(!(length> 0))
return std :: wstring();
else
{
std :: wstring result;
result.resize(length);

if(MultiByteToWideChar(CP_UTF8,NULL,
input.c_str(),input.size(),
& result [0],result.size())> ; 0)
return result;
else
throw std :: runtime_error(无法执行toUTF16:转换失败。
}
}


解决方案

Win32 API已经有一个函数来做到这一点,WideCharToMultiByte()与CodePage = CP_UTF8。避免您不得不依赖另一个库。



您通常不能使用wcout的结果。它的输出到控制台,由于遗留的原因它使用8位OEM编码。您可以使用SetConsoleCP()更改代码页,65001是UTF-8(CP_UTF8)的代码页。



您下一个绊脚的块将是用于控制台。你必须改变它,但发现一个固定字体的字体,并有一套完整的字形来覆盖Unicode将是困难的。当你在输出中得到正方形矩形时,你会看到一个字体问题。问号是编码问题。


Is it good/safe/possible to use the tiny utfcpp library for converting everything I get back from the wide Windows API (FindFirstFileW and such) to a valid UTF8 representation using utf16to8?

I would like to use UTF8 internally, but am having trouble getting the correct output (via wcout after another conversion or plain cout). Normal ASCII characters work of course, but ñä gets messed up.

Or is there an easier alternative?

Thanks!

UPDATE: Thanks to Hans (below), I now have an easy UTF8<->UTF16 conversion through the Windows API. Two way conversion works, but the UTF8 from UTF16 string has some extra characters that might cause me some trouble later on...). I'll share it here out of pure friendliness :) ):

// UTF16 -> UTF8 conversion
std::string toUTF8( const std::wstring &input )
{
    // get length
    int length = WideCharToMultiByte( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0,
                                      NULL, NULL );
    if( !(length > 0) )
        return std::string();
    else
    {
        std::string result;
        result.resize( length );

        if( WideCharToMultiByte( CP_UTF8, NULL,
                                 input.c_str(), input.size(),
                                 &result[0], result.size(),
                                 NULL, NULL ) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF8: conversion failed." );
    }
}
// UTF8 -> UTF16 conversion
std::wstring toUTF16( const std::string &input )
{
    // get length
    int length = MultiByteToWideChar( CP_UTF8, NULL,
                                      input.c_str(), input.size(),
                                      NULL, 0 );
    if( !(length > 0) )
        return std::wstring();
    else
    {
        std::wstring result;
        result.resize( length );

        if( MultiByteToWideChar(CP_UTF8, NULL,
                                input.c_str(), input.size(),
                                &result[0], result.size()) > 0 )
            return result;
        else
            throw std::runtime_error( "Failure to execute toUTF16: conversion failed." );
    }
}

解决方案

The Win32 API already has a function to do this, WideCharToMultiByte() with CodePage = CP_UTF8. Saves you from having to rely on another library.

You cannot normally use the result with wcout. Its output goes to the console, it uses an 8-bit OEM encoding for legacy reasons. You can change the code page with SetConsoleCP(), 65001 is the code page for UTF-8 (CP_UTF8).

Your next stumbling block would be the font that's used for the console. You'll have to change it but finding a font that's fixed-pitch and has a full set of glyphs to cover Unicode is going to be difficult. You'll see you have a font problem when you get square rectangles in the output. Question marks are encoding problems.

这篇关于utfcpp和Win32宽API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆