打印UTF-8字符串的printf带 - 宽与多字节字符串 [英] Printing UTF-8 strings with printf - wide vs. multibyte string literals

查看：1139 发布时间：2016/8/17 21:17:22 c unicode utf-8 printf multibyte

本文介绍了打印UTF-8字符串的printf带 - 宽与多字节字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在这样的语句，其中两者都被输入到源$ C $ C具有相同的编码（UTF-8）和区域设置正确，是否有他们之间的任何实际的区别？

 的printf（οΔικαιοπολιςεναγρωεστιν\\ n）;
的printf（％1！LοΔικαιοπολιςεναγρωεστιν\\ n）;

和因此没有任何理由为preFER一个比其他做时输出？我想第二个执行一个公平的有点差，但它有一个超过多字节任何优势（或劣势）的文字？

编辑：没有与这些字符串的打印没有问题。但我不使用广角字符串函数，因为我希望能够使用的printf 等为好。所以，问题是打印有什么不同（鉴于形势上述）这些方式，如果是这样，那么第二个有什么优势？

EDIT2：按照下面的评论，我现在才知道，这个项目工程 - 我认为是不可能的：

  INT的main（）
{
    的setlocale（LC_ALL，）;
    wprintf（LοΔικαιοπολιςεναγρωεστιν\\ n）; //宽输出
    freopen函数（NULL，W，标准输出）; //让我开关
    的printf（οΔικαιοπολιςεναγρωεστιν\\ n）; //输出字节
}

EDIT3 ：我已经做了通过看发生了什么事情的两种类型的一些进一步的研究。以一个简单的字符串：

  wchar_t的* wides = L£100π;
字符* MBS =£100π;

编译器产生不同的code。宽字符串是：

  .string\\ 243
.string
.string
.string1
.string
.string
.string0
.string
.string
.string0
.string
.string
.string
.string
.string
.string\\ 300 \\ 003
.string
.string
.string
.string
.string

而第二个是：

  .string\\ 302 \\ 243100 \\ 317 \\ 200

和仰望的Uni code编码，第二个是纯UTF-8。宽字符重新presentation是UTF-32。我意识到这将是依赖于实现的。

因此，也许文字的宽字符重新presentation更便于携带？我的系统将不能打印UTF-16 / UTF-32编码直接，因此它被自动转换为UTF-8进行输出。

解决方案

 的printf（οΔικαιοπολιςεναγρωεστιν\\ n）;

打印字符串（为const char * 特殊字符重新presented为的字节的字符）。虽然你可能会看到正确的输出，也有可能会被处理，而使用非ASCII字符，如这些工作的其他问题。例如：

 字符海峡[] =αγρω;
的printf（％D \\ N的sizeof（STR），strlen的（STR））;

输出 9月8日，因为每个这些特殊字符重新由2 字符取值psented $ P $。

当使用→ preFIX你有文字组成的宽字符（常量为wchar_t * ）和％1！格式说明会导致这些宽字符转换为的字节字符的（UTF-8）。注意，在这种情况下，区域应被适当地设置，否则此转换可能导致输出为无效：

 的#include＆LT;＆stdio.h中GT;
＃包括LT＆;＆wchar.h GT;
＃包括LT＆;＆locale.h文件GT;INT主要（无效）
{
    的setlocale（LC_ALL，）;
    的printf（％1！LοΔικαιοπολιςεναγρωεστιν）;
    返回0;
}

但同时与宽字符时有些事情可能会变得更加复杂，其他的事情可能会更简单，更直接。例如：

  wchar_t的海峡[] = Lαγρω
的printf（％D的sizeof（STR）/的sizeof（wchar_t的），wcslen（STR））;

将输出 5 4 为一个自然希望。

一旦你决定用宽字符串工作， wprintf 可用于打印的宽字符的直接。这也是值得在这里指出，在Windows控制台的情况下，翻译模式标准输出应通过调用明确设置为单向code模式之一<一个HREF =http://msdn.microsoft.com/en-us/aa298581.aspx> _setmode ：

 的#include＆LT;＆stdio.h中GT;
＃包括LT＆;＆wchar.h GT;＃包括LT＆;＆io.h GT;
＃包括LT＆;＆fcntl.h GT;
的#ifndef _O_U16TEXT
  ＃定义_O_U16TEXT地址0x20000
＃万一诠释的main（）
{
    _setmode（_fileno（标准输出），_O_U16TEXT）;
    wprintf（L％S \\ n，LοΔικαιοπολιςεναγρωεστιν）;
    返回0;
}

In statements like these, where both are entered into the source code with the same encoding (UTF-8) and the locale is set up properly, is there any practical difference between them?

printf("ο Δικαιοπολις εν αγρω εστιν\n");
printf("%ls", L"ο Δικαιοπολις εν αγρω εστιν\n");

And consequently is there any reason to prefer one over the other when doing output? I imagine the second performs a fair bit worse, but does it have any advantage (or disadvantage) over a multibyte literal?

EDIT: There are no issues with these strings printing. But I'm not using the wide string functions, because I want to be able to use printf etc. as well. So the question is are these ways of printing any different (given the situation outlined above), and if so, does the second one have any advantage?

EDIT2: Following the comments below, I now know that this program works -- which I thought wasn't possible:

int main()
{
    setlocale(LC_ALL, "");
    wprintf(L"ο Δικαιοπολις εν αγρω εστιν\n");  // wide output
    freopen(NULL, "w", stdout);                 // lets me switch
    printf("ο Δικαιοπολις εν αγρω εστιν\n");    // byte output
}

EDIT3: I've done some further research by looking at what's going on with the two types. Take a simpler string:

wchar_t *wides = L"£100 π";
char *mbs = "£100 π";

The compiler is generating different code. The wide string is:

.string "\243"
.string ""
.string ""
.string "1"
.string ""
.string ""
.string "0"
.string ""
.string ""
.string "0"
.string ""
.string ""
.string " "
.string ""
.string ""
.string "\300\003"
.string ""
.string ""
.string ""
.string ""
.string ""

While the second is:

.string "\302\243100 \317\200"

And looking at the Unicode encodings, the second is plain UTF-8. The wide character representation is UTF-32. I realise this is going to be implementation-dependent.

So perhaps the wide character representation of literals is more portable? My system will not print UTF-16/UTF-32 encodings directly, so it is being automatically converted to UTF-8 for output.

解决方案

printf("ο Δικαιοπολις εν αγρω εστιν\n");

prints the string literal (const char*, special characters are represented as multibyte characters). Although you might see the correct output, there are other problems you might be dealing with while working with non-ASCII characters like these. For example:

char str[] = "αγρω";
printf("%d %d\n", sizeof(str), strlen(str));

outputs 9 8, since each of these special characters is represented by 2 chars.

While using the L prefix you have the literal consisting of wide characters (const wchar_t*) and %ls format specifier causes these wide characters to be converted to multibyte characters (UTF-8). Note that in this case, locale should be set appropriately otherwise this conversion might lead to the output being invalid:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
    setlocale(LC_ALL, "");
    printf("%ls", L"ο Δικαιοπολις εν αγρω εστιν");
    return 0;
}

but while some things might get more complicated when working with wide characters, other things might get much simpler and more straightforward. For example:

wchar_t str[] = L"αγρω";
printf("%d %d", sizeof(str) / sizeof(wchar_t), wcslen(str));

will output 5 4 as one would naturally expect.

Once you decide to work with wide strings, wprintf can be used to print wide characters directly. It's also worth to note here that in case of Windows console, the translation mode of the stdout should be explicitly set to one of the Unicode modes by calling _setmode:

#include <stdio.h>
#include <wchar.h>

#include <io.h>
#include <fcntl.h>
#ifndef _O_U16TEXT
  #define _O_U16TEXT 0x20000
#endif

int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    wprintf(L"%s\n", L"ο Δικαιοπολις εν αγρω εστιν");
    return 0;
}

这篇关于打印UTF-8字符串的printf带 - 宽与多字节字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

打印UTF-8字符串的printf带 - 宽与多字节字符串 [英] Printing UTF-8 strings with printf - wide vs. multibyte string literals

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

打印UTF-8字符串的printf带 - 宽与多字节字符串 [英] Printing UTF-8 strings with printf - wide vs. multibyte string literals

问题描述

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭