处理C中的特殊字符(UTF-8编码) [英] Handling special characters in C (UTF-8 encoding)

查看:251
本文介绍了处理C中的特殊字符(UTF-8编码)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在C写一个小应用程序,读取一个简单的文本文件,然后逐一输出。问题是文本文件包含Æ,Ø和Å等特殊字符。当我在终端中运行程序时,这些字符的输出用?表示。



有一个简单的修复?

解决方案

第一件事:


  1. 读入缓冲区

  2. 使用libiconv或类似方式从UTF-8获取wchar_t类型,并使用宽字符处理函数,如wprintf()

  3. 使用C中的宽字符函数!大多数文件/输出处理函数都有一个宽字符的变体

确保您的终端可以处理UTF-8输出。具有正确的区域设置和操作区域设置数据可以自动化您的文件打开和转换,取决于您正在做什么。



记住表示UTF-8中的代码点或字符的宽度是可变的。这意味着你不能只是寻求一个字节,开始读ASCII,因为你可能会在一个代码点的中间。在某些情况下,良好的库可以做到这一点。



这是一些代码(不是我的),它演示了一些使用UTF-8文件读取和宽字符处理的用法。 / p>

  #include< stdio.h> 
#include< wchar.h>
int main()
{
FILE * f = fopen(data.txt,r,ccs = UTF-8);
if(!f)
return 1;

for(wint_t c;(c = fgetwc(f))!= WEOF;)
printf(%04X\\\
,c);

fclose(f);
return 0;
}

链接


  1. libiconv

  2. C / GNU libc中的区域设置数据

  3. 一些方便的信息

  4. C资源中的另一个好的Unicode / UTF-8


I'm writing a small application in C that reads a simple text file and then outputs the lines one by one. The problem is that the text file contains special characters like Æ, Ø and Å among others. When I run the program in terminal the output for those characters are represented with a "?".

Is there an easy fix?

解决方案

First things first:

  1. Read in the buffer
  2. Use libiconv or similar to obtain wchar_t type from UTF-8 and use the wide character handling functions such as wprintf()
  3. Use the wide character functions in C! Most file/output handling functions have a wide-character variant

Ensure that your terminal can handle UTF-8 output. Having the correct locale setup and manipulating the locale data can automate alot of the file opening and conversion for you ... depending on what you are doing.

Remember that the width of a code-point or character in UTF-8 is variable. This means you can't just seek to a byte and begin reading like with ASCII ... because you might land in the middle of a code point. Good libraries can do this in some cases.

Here is some code (not mine) that demonstrates some usage of UTF-8 file reading and wide character handling in C.

#include <stdio.h>
#include <wchar.h>
int main()
{
    FILE *f = fopen("data.txt", "r, ccs=UTF-8");
    if (!f)
        return 1;

    for (wint_t c; (c = fgetwc(f)) != WEOF;)
        printf("%04X\n", c);

    fclose(f);
    return 0;
}

Links

  1. libiconv
  2. Locale data in C/GNU libc
  3. Some handy info
  4. Another good Unicode/UTF-8 in C resource

这篇关于处理C中的特殊字符(UTF-8编码)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆