QString到Unicode std :: string [英] QString to unicode std::string
问题描述
我知道有很多有关将 QString
转换为 char *
的信息,但是在这个问题上我仍然需要澄清.
I know there is plenty of information about converting QString
to char*
, but I still need some clarification in this question.
Qt提供了 QTextCodec
来将 QString
(内部以unicode存储字符)转换为 QByteArray
,从而使我能够检索 char*
表示某些非unicode编码中的字符串.但是,当我想获取Unicode QByteArray
时该怎么办?
Qt provides QTextCodec
s to convert QString
(which internally stores characters in unicode) to QByteArray
, allowing me to retrieve char*
which represents the string in some non-unicode encoding. But what should I do when I want to get a unicode QByteArray
?
QTextCodec* codec = QTextCodec::codecForName("UTF-8");
QString qstr = codec->toUnicode("Юникод");
std::string stdstr(reinterpret_cast<const char*>(qstr.constData()), qstr.size() * 2 ); // * 2 since unicode character is twice longer than char
qDebug() << QString(reinterpret_cast<const QChar*>(stdstr.c_str()), stdstr.size() / 2); // same
上面的代码按照我的预期打印Юникод".但是我想知道这是否是获取 QString
的unicode char *
的正确方法.特别是,此技术中的 reinterpret_cast
s和大小算法看起来很丑.
The above code prints "Юникод" as I've expected. But I'd like to know if that is the right way to get to the unicode char*
of the QString
. In particular, reinterpret_cast
s and size arithmetics in this technique looks pretty ugly.
推荐答案
以下内容适用于Qt5.Qt4的行为是不同的,实际上是不正确的.
您需要选择:
-
是否要使用8位宽的
std :: string
或16位宽的std :: wstring
或其他某种类型.
Whether you want the 8-bit wide
std::string
or 16-bit widestd::wstring
, or some other type.
目标字符串中需要哪种编码?
What encoding is desired in your target string?
在内部, QString
存储UTF-16编码的数据,因此任何Unicode代码点都可以用一个或两个 QChar
表示.
Internally, QString
stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChar
s.
常见案例:
-
本地编码的8位
std :: string
(如:系统区域设置):
Locally encoded 8-bit
std::string
(as in: system locale):
std::string(str.toLocal8Bit().constData())
UTF-8编码的8位 std :: string
:
str.toStdString()
这等效于:
std::string(str.toUtf8().constData())
UTF-16或UCS-4编码的 std :: wstring
,分别为16或32位宽.Qt选择16位和32位编码,以匹配平台的 wchar_t
宽度.
UTF-16 or UCS-4 encoded std::wstring
, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width of wchar_t
.
str.toStdWString()
C ++ 11的U16或U32字符串-从Qt 5.5起:
U16 or U32 strings of C++11 - from Qt 5.5 onwards:
str.toStdU16String()
str.toStdU32String()
UTF-16编码的16位 std :: u16string
-仅在Qt 5.4之前需要此hack:
UTF-16 encoded 16-bit std::u16string
- this hack is only needed up to Qt 5.4:
std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
此编码不包含字节顺序标记(BOM).
This encoding does not include byte order marks (BOMs).
在转换之前将BOM预先添加到 QString
本身很容易:
It's easy to prepend BOMs to the QString
itself before converting it:
QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
src.size()};
#else
auto dst = src.toStdU16String();
如果您希望字符串很大,则可以跳过一个副本:
If you expect the strings to be large, you can skip one copy:
const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
src.size()+1);
在两种情况下, dst
现在都可以移植到具有任意字节序的系统.
In both cases, dst
is now portable to systems with either endianness.
这篇关于QString到Unicode std :: string的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!