QString :: toUtf8在做什么? [英] What is QString::toUtf8 doing?

查看:1560
本文介绍了QString :: toUtf8在做什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这听起来像是一个显而易见的问题,但我缺少有关UTF-8编码方式或toUtf8函数如何工作的一些信息.

让我们看一个非常简单的程序

QString str("Müller");
qDebug() << str << str.toUtf8().toHex();

然后我得到输出

"Müller" "4dc383c2bc6c6c6572" 

但是我想到了字母ü已被编码为 c3bc 而不是 c383c2bc .

谢谢 约翰

解决方案

这取决于源代码的编码.

我倾向于认为您的文件已经使用UTF-8编码,字符ü编码为C3 BC.

您正在调用QString::QString ( const char * str )构造函数,该构造函数根据 http ://doc.qt.io/qt-4.8/qstring.html#QString-8 ,使用QString :: fromAscii()方法将您的字符串转换为Unicode,默认情况下会将输入视为Latin1内容./p>

因为C3和BC都在拉丁语1中有效,分别表示Ã和¼ ;,将它们转换为UTF-8会导致以下字符:

à (C3)-> C3 83

¼ (BC)-> C2 BC

这将导致您得到以下字符串:"4d c3 83 c2 bc 6c 6c 65 72"

总而言之,它是双重UTF-8编码.

有几种方法可以解决此问题:

1)您可以使用喜欢的文本编辑器将源文件转换为Latin-1.

2)您可以正确地将垃圾字符中的ü字符转义为\ xFC,因此该字符串将不依赖于文件的编码.

3)您可以将文件和字符串保留为UTF-8数据,并使用QString str = QString::fromUtf8 ("Müller");

更新:此问题在QT5中不再相关. http://doc.qt.io/qt-5/qstring.html #QString-8 声明构造函数现在在内部使用QString::fromUtf8()而不是QString::fromAscii().因此,只要统一使用UTF-8编码,就会默认使用它.

This may sounds like a obvious question, but I'm missing something about either how UTF-8 is encoded or how the toUtf8 function works.

Let's look at a very simple program

QString str("Müller");
qDebug() << str << str.toUtf8().toHex();

Then I get the output

"Müller" "4dc383c2bc6c6c6572" 

But I got the idea the the letter ü should have been encoded as c3bc and not c383c2bc.

Thanks Johan

解决方案

It depends on the encoding of your source code.

I tend to think that your file is already encoded in UTF-8, the character ü being encoded as C3 BC.

You're calling the QString::QString ( const char * str ) constructor which, according to http://doc.qt.io/qt-4.8/qstring.html#QString-8, converts your string to unicode using the QString::fromAscii() method which by default considers the input as Latin1 contents.

As C3 and BC are both valid in Latin 1, representing respectively à and ¼, converting them to UTF-8 will lead to the following characters:

à (C3) -> C3 83

¼ (BC) -> C2 BC

which leads to the string you get: "4d c3 83 c2 bc 6c 6c 65 72"

To sum things up, it's double UTF-8 encoding.

There are several options to solve this issue:

1) You can convert your source file to Latin-1 using your favorite text editor.

2) You can properly escape the ü character into \xFC in the litteral string, so the string won't depend on the file's encoding.

3) you can keep the file and string as UTF-8 data and use QString str = QString::fromUtf8 ("Müller");

Update: This issue is no longer relevant in QT5. http://doc.qt.io/qt-5/qstring.html#QString-8 states that the constructor now uses QString::fromUtf8() internally instead of QString::fromAscii(). So, as long as UTF-8 encoding is used consistently, it will be used by default.

这篇关于QString :: toUtf8在做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆