将窄字符串转换为宽字符串时，为什么要用0xFF屏蔽字符？ [英] Why mask a char with 0xFF when converting narrow string to wide string?

查看：101 发布时间：2020/9/27 23:40:31 c++ c++11 wstring

本文介绍了将窄字符串转换为宽字符串时，为什么要用0xFF屏蔽字符？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

std::wstring convert(const std::string& input)
{
    try
    {
        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
        return converter.from_bytes(input);
    }
    catch(std::range_error& e)
    {
        std::size_t length = input.length();
        std::wstring result;
        result.reserve(length);
        for(std::size_t i = 0; i < length; i++)
        {
            result.push_back(input[i] & 0xFF);
        }
        return result;
    }
}

我很难理解以下表达式的必要性后备路径：

I am having difficulty understanding the need for this expression in the fallback path:

result.push_back(input[i] & 0xFF);

为什么字符串中的每个字符都被0xFF（0b11111111）屏蔽？

Why is each character in the string being masked with 0xFF (0b11111111)?

推荐答案

使用 0xFF 进行伪装会将所有负值减少到0-255范围内。

Masking with 0xFF reduces any negative values into the range 0-255.

例如，如果您平台的 char 是表示ISO-8859-1字符的8位带符号类型，这是合理的，并且您的 wchar_t 代表UCS-2，UTF-16或UCS-4。

This is reasonable if, for example, your platform's char is an 8-bit signed type representing ISO-8859-1 characters, and your wchar_t is representing UCS-2, UTF-16 or UCS-4.

未经此更正（或类似的操作，例如转换为 unsigned char 或 std :: byte ），当您将字符提升为较宽的类型时，您会发现字符会进行符号扩展。

Without this correction (or something similar, such as casting to unsigned char or std::byte), you would find that characters are sign-extended when promoted to the wider type.

我认为转换 char 转换为无符号字符-适用于任何大小的字符，并更好地传达意图。您可以直接更改该表达式，也可以创建一个 codecvt 子类，该子类为正在执行的操作提供名称。


I think it's clearer to convert the char to an unsigned char - that works for any size char, and conveys the intent better.  You can change that expression directly, or create a codecvt subclass that gives a name to what you're doing.
以下是编写和使用最小的 codecvt 的方法（仅适用于窄→宽转换）：
Here's how to write and use a minimal codecvt (for narrow → wide conversion only):
#include <codecvt>
#include <locale>
#include <string>

class codecvt_latin1 : public std::codecvt<wchar_t,char,std::mbstate_t>
{
protected:
    virtual result do_in(std::mbstate_t&,
                         const char* from,
                         const char* from_end,
                         const char*& from_next,
                         wchar_t* to,
                         wchar_t* to_end,
                         wchar_t*& to_next) const override
    {
        while (from != from_end && to != to_end)
            *to++ = (unsigned char)*from++;
        from_next = from;
        to_next = to;
        return result::ok;
    }
};

std::wstring convert(const std::string& input)
{
    using codecvt_utf8 = std::codecvt_utf8<wchar_t>;
    try {
        return std::wstring_convert<codecvt_utf8>().from_bytes(input);
    } catch (std::range_error&) {
        return std::wstring_convert<codecvt_latin1>{}.from_bytes(input);
    }
}

 
 
 
 
 





#include <iostream>

int main()
{
    std::locale::global(std::locale{""});

    // UTF-8:  £© おはよう
    std::wcout << convert(u8"\xc2\xa3\xc2\xa9 おはよう") << std::endl;
    // Latin-1: Â£©
    std::wcout << convert("\xc2\xa3\xa9") << std::endl;
}

输出：
£© おはよう
Â£©


                        这篇关于将窄字符串转换为宽字符串时，为什么要用0xFF屏蔽字符？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

将窄字符串转换为宽字符串时，为什么要用0xFF屏蔽字符？ [英] Why mask a char with 0xFF when converting narrow string to wide string?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

将窄字符串转换为宽字符串时，为什么要用0xFF屏蔽字符？ [英] Why mask a char with 0xFF when converting narrow string to wide string?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭