QString到Unicode std :: string [英] QString to unicode std::string

查看:103
本文介绍了QString到Unicode std :: string的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有很多有关将 QString 转换为 char * 的信息,但是在这个问题上我仍然需要澄清.

I know there is plenty of information about converting QString to char*, but I still need some clarification in this question.

Qt提供了 QTextCodec 来将 QString (内部以unicode存储字符)转换为 QByteArray ,从而使我能够检索 char* 表示某些非unicode编码中的字符串.但是,当我想获取Unicode QByteArray 时该怎么办?

Qt provides QTextCodecs to convert QString (which internally stores characters in unicode) to QByteArray, allowing me to retrieve char* which represents the string in some non-unicode encoding. But what should I do when I want to get a unicode QByteArray?

QTextCodec* codec = QTextCodec::codecForName("UTF-8");
QString qstr = codec->toUnicode("Юникод");
std::string stdstr(reinterpret_cast<const char*>(qstr.constData()), qstr.size() * 2 );  // * 2 since unicode character is twice longer than char
qDebug() << QString(reinterpret_cast<const QChar*>(stdstr.c_str()), stdstr.size() / 2); // same

上面的代码按照我的预期打印Юникод".但是我想知道这是否是获取 QString 的unicode char * 的正确方法.特别是,此技术中的 reinterpret_cast s和大小算法看起来很丑.

The above code prints "Юникод" as I've expected. But I'd like to know if that is the right way to get to the unicode char* of the QString. In particular, reinterpret_casts and size arithmetics in this technique looks pretty ugly.

推荐答案

以下内容适用于Qt5.Qt4的行为是不同的,实际上是不正确的.

您需要选择:

  1. 是否要使用8位宽的 std :: string 或16位宽的 std :: wstring 或其他某种类型.

  1. Whether you want the 8-bit wide std::string or 16-bit wide std::wstring, or some other type.

目标字符串中需要哪种编码?

What encoding is desired in your target string?

在内部, QString 存储UTF-16编码的数据,因此任何Unicode代码点都可以用一个或两个 QChar 表示.

Internally, QString stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChars.

常见案例:

  • 本地编码的8位 std :: string (如:系统区域设置):

  • Locally encoded 8-bit std::string (as in: system locale):

std::string(str.toLocal8Bit().constData())

  • UTF-8编码的8位 std :: string :

    str.toStdString()
    

    这等效于:

    std::string(str.toUtf8().constData())
    

  • UTF-16或UCS-4编码的 std :: wstring ,分别为16或32位宽.Qt选择16位和32位编码,以匹配平台的 wchar_t 宽度.

  • UTF-16 or UCS-4 encoded std::wstring, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width of wchar_t.

    str.toStdWString()
    

  • C ++ 11的U16或U32字符串-从Qt 5.5起:

  • U16 or U32 strings of C++11 - from Qt 5.5 onwards:

    str.toStdU16String()
    str.toStdU32String()
    

  • UTF-16编码的16位 std :: u16string -仅在Qt 5.4之前需要此hack:

  • UTF-16 encoded 16-bit std::u16string - this hack is only needed up to Qt 5.4:

    std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
    

    此编码不包含字节顺序标记(BOM).

    This encoding does not include byte order marks (BOMs).

    在转换之前将BOM预先添加到 QString 本身很容易:

    It's easy to prepend BOMs to the QString itself before converting it:

    QString src = ...;
    src.prepend(QChar::ByteOrderMark);
    #if QT_VERSION < QT_VERSION_CHECK(5,5,0)
    auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
                              src.size()};
    #else
    auto dst = src.toStdU16String();
    

    如果您希望字符串很大,则可以跳过一个副本:

    If you expect the strings to be large, you can skip one copy:

    const QString src = ...;
    std::u16string dst;
    dst.reserve(src.size() + 2); // BOM + termination
    dst.append(char16_t(QChar::ByteOrderMark));
    dst.append(reinterpret_cast<const char16_t*>(src.constData()),
               src.size()+1);
    

    在两种情况下, dst 现在都可以移植到具有任意字节序的系统.

    In both cases, dst is now portable to systems with either endianness.

    这篇关于QString到Unicode std :: string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆