在C(UTF-8编码)处理的特殊字符 [英] Handling special characters in C (UTF-8 encoding)

查看:1358
本文介绍了在C(UTF-8编码)处理的特殊字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C语言编写的小型应用程序读取一个简单的文本文件,然后输出线一个接一个。问题是,该文本文件包含特殊字符,如在其他AE,O和A。当我运行在终端输出的这些字符重新与psented $ P $程序中的?。

I'm writing a small application in C that reads a simple text file and then outputs the lines one by one. The problem is that the text file contains special characters like Æ, Ø and Å among others. When I run the program in terminal the output for those characters are represented with a "?".

有一个简单的办法?

推荐答案

首先第一件事情:


  1. 阅读缓冲

  2. 使用libiconv的或类似的从UTF-8获得wchar_t的类型,并使用宽字符处理函数如wprintf()

  3. 使用宽字符函数用C!大多数文件/输出处理函数有一个宽字符变种

确保您的终端可以处理UTF-8输出。具有正确的语言环境的设置和操作的语言环境数据可以根据你在做什么自动化文件打开并转换为你的很多...

Ensure that your terminal can handle UTF-8 output. Having the correct locale setup and manipulating the locale data can automate alot of the file opening and conversion for you ... depending on what you are doing.

记住在UTF-8 code-点或字符的宽度是可变的。这意味着你不能只寻求一个字节,并开始阅读就像ASCII ......因为你可能在code点的中间降落。好的库可以在某些情况下,做到这一点。

Remember that the width of a code-point or character in UTF-8 is variable. This means you can't just seek to a byte and begin reading like with ASCII ... because you might land in the middle of a code point. Good libraries can do this in some cases.

下面是一些code(不是我),演示的UTF-8的文件读取和宽字符用C处理一些使用。

Here is some code (not mine) that demonstrates some usage of UTF-8 file reading and wide character handling in C.

#include <stdio.h>
#include <wchar.h>
int main()
{
    FILE *f = fopen("data.txt", "r, ccs=UTF-8");
    if (!f)
        return 1;

    for (wint_t c; (c = fgetwc(f)) != WEOF;)
        printf("%04X\n", c);

    fclose(f);
    return 0;
}

链接


  1. libiconv的

  2. 在C / GNU
  3. 语言环境数据的libc

  4. 一些常见信息

  5. 另一个很好的统一code / UTF-8在I2C资源

  1. libiconv
  2. Locale data in C/GNU libc
  3. Some handy info
  4. Another good Unicode/UTF-8 in C resource

这篇关于在C(UTF-8编码)处理的特殊字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆