尝试读取宽字符会产生EOF [英] Trying to read wide char gives EOF

查看:85
本文介绍了尝试读取宽字符会产生EOF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文本文件 foo.txt ,其中包含以下内容:

I've got a text file, foo.txt, with these contents:

R⁸2

我有一个大程序读取它并处理每个字符,但是当它碰到时,它总是收到EOF.这是代码的相关部分:

I had a large program reading it and doing things with each character, but it always received EOF when it hit the . Here's the relevant portions of the code:

setlocale(LC_ALL,"");

FILE *in = fopen(argv[1],"r");

while (1) {
    wint_t c = getwc(in);
    printf("%d ",wctob(c));

    if (c == -1)
        printf("Error %d: %s\n",errno,strerror(errno));

    if (c == WEOF)
        return 0;
}

它打印 82 -1 ( R 和EOF的ASCII码).无论文件中的¹为何,它始终显示为EOF.编辑,我添加了 errno 的检查,并显示了以下内容:

It prints 82 -1 (the ASCII codes for R and EOF). No matter where I have the ¹ in the file, it always reads as EOF. Edit, I added a check for errno and it gives this:

Error 84: Invalid or incomplete multibyte or wide character

但是,⁸是 Unicode U + 2078'SUPERSCRIPT EIGHT'.我通过 cat 将其写到 foo.txt 并从fileformat.info复制粘贴. foo.txt 的十六进制转储显示:

However, ⁸ is Unicode U+2078 'SUPERSCRIPT EIGHT'. I wrote it to foo.txt via cat and copy-pasting from fileformat.info. A hexdump of foo.txt shows:

0000000: 52e2 81b8 32                             R...2

出什么问题了?

推荐答案

1.检查 WEOF 而不是 EOF

EOF 用于单字节字符. WEOF 用于宽字符.当使用 getwc 读取宽字符的开头时,有时会返回单字节EOF.

1. Check for WEOF instead of EOF

EOF is meant for single-byte characters. WEOF is for wide characters. When reading the start of a wide character with getwc, single-byte EOF can sometimes be returned.

stdio.h 中:

#define EOF (-1)

wchar.h 中:

#define WEOF (0xffffffffu)

2.将语言环境设置为一种支持Unicode的语言

C程序的默认语言环境是 C ,也称为 POSIX ,仅用于ASCII.使用 setlocale ,有时有必要为支持Unicode的代码页显式设置适当的语言环境. C.UTF-8 是便携式的.

2. Set the locale to one supporting Unicode

The default locale of a C program is C, also called POSIX, which is only meant for ASCII. Using setlocale, it is sometimes necessary to explicitly set the appropriate locales to codepages that support Unicode. C.UTF-8 is portable.

setlocale(LC_ALL,"C.UTF-8");
setlocale(LC_CTYPE,"C.UTF-8");

3.为宽字符使用正确的类型

getwc 的返回值不是 char int 甚至不是 wchar_t ,它是wint_t .确保您的字符变量 c 的类型为 wint_t ,以避免出现内存问题.

3. Use the proper type for wide characters

The return value of getwc isn't char, int or even wchar_t, it's wint_t. Make sure that your character variable c is of type wint_t, to avoid memory problems.

这篇关于尝试读取宽字符会产生EOF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆