处理C中的特殊字符(UTF-8编码) [英] Handling special characters in C (UTF-8 encoding)
问题描述
有一个简单的修复?
第一件事:
- 读入缓冲区
- 使用libiconv或类似方式从UTF-8获取wchar_t类型,并使用宽字符处理函数,如wprintf()
- 使用C中的宽字符函数!大多数文件/输出处理函数都有一个宽字符的变体
确保您的终端可以处理UTF-8输出。具有正确的区域设置和操作区域设置数据可以自动化您的文件打开和转换,取决于您正在做什么。
记住表示UTF-8中的代码点或字符的宽度是可变的。这意味着你不能只是寻求一个字节,开始读ASCII,因为你可能会在一个代码点的中间。在某些情况下,良好的库可以做到这一点。
这是一些代码(不是我的),它演示了一些使用UTF-8文件读取和宽字符处理的用法。 / p>
#include< stdio.h>
#include< wchar.h>
int main()
{
FILE * f = fopen(data.txt,r,ccs = UTF-8);
if(!f)
return 1;
for(wint_t c;(c = fgetwc(f))!= WEOF;)
printf(%04X\\\
,c);
fclose(f);
return 0;
}
链接
I'm writing a small application in C that reads a simple text file and then outputs the lines one by one. The problem is that the text file contains special characters like Æ, Ø and Å among others. When I run the program in terminal the output for those characters are represented with a "?".
Is there an easy fix?
First things first:
- Read in the buffer
- Use libiconv or similar to obtain wchar_t type from UTF-8 and use the wide character handling functions such as wprintf()
- Use the wide character functions in C! Most file/output handling functions have a wide-character variant
Ensure that your terminal can handle UTF-8 output. Having the correct locale setup and manipulating the locale data can automate alot of the file opening and conversion for you ... depending on what you are doing.
Remember that the width of a code-point or character in UTF-8 is variable. This means you can't just seek to a byte and begin reading like with ASCII ... because you might land in the middle of a code point. Good libraries can do this in some cases.
Here is some code (not mine) that demonstrates some usage of UTF-8 file reading and wide character handling in C.
#include <stdio.h>
#include <wchar.h>
int main()
{
FILE *f = fopen("data.txt", "r, ccs=UTF-8");
if (!f)
return 1;
for (wint_t c; (c = fgetwc(f)) != WEOF;)
printf("%04X\n", c);
fclose(f);
return 0;
}
Links
这篇关于处理C中的特殊字符(UTF-8编码)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!