尝试读取宽字符会产生EOF [英] Trying to read wide char gives EOF
问题描述
我有一个文本文件 foo.txt
,其中包含以下内容:
I've got a text file, foo.txt
, with these contents:
R⁸2
我有一个大程序读取它并处理每个字符,但是当它碰到⁸
时,它总是收到EOF.这是代码的相关部分:
I had a large program reading it and doing things with each character, but it always received EOF when it hit the ⁸
. Here's the relevant portions of the code:
setlocale(LC_ALL,"");
FILE *in = fopen(argv[1],"r");
while (1) {
wint_t c = getwc(in);
printf("%d ",wctob(c));
if (c == -1)
printf("Error %d: %s\n",errno,strerror(errno));
if (c == WEOF)
return 0;
}
它打印 82 -1
( R
和EOF的ASCII码).无论文件中的¹
为何,它始终显示为EOF.编辑,我添加了 errno
的检查,并显示了以下内容:
It prints 82 -1
(the ASCII codes for R
and EOF). No matter where I have the ¹
in the file, it always reads as EOF. Edit, I added a check for errno
and it gives this:
Error 84: Invalid or incomplete multibyte or wide character
但是,⁸是 Unicode U + 2078'SUPERSCRIPT EIGHT'.我通过 cat
将其写到 foo.txt
并从fileformat.info复制粘贴. foo.txt
的十六进制转储显示:
However, ⁸ is Unicode U+2078 'SUPERSCRIPT EIGHT'. I wrote it to foo.txt
via cat
and copy-pasting from fileformat.info. A hexdump of foo.txt
shows:
0000000: 52e2 81b8 32 R...2
出什么问题了?
推荐答案
1.检查 WEOF
而不是 EOF
EOF
用于单字节字符. WEOF
用于宽字符.当使用 getwc
读取宽字符的开头时,有时会返回单字节EOF.
1. Check for WEOF
instead of EOF
EOF
is meant for single-byte characters. WEOF
is for wide characters. When reading the start of a wide character with getwc
, single-byte EOF can sometimes be returned.
在 stdio.h
中:
#define EOF (-1)
在 wchar.h
中:
#define WEOF (0xffffffffu)
2.将语言环境设置为一种支持Unicode的语言
C程序的默认语言环境是 C
,也称为 POSIX
,仅用于ASCII.使用 setlocale
,有时有必要为支持Unicode的代码页显式设置适当的语言环境. C.UTF-8
是便携式的.
2. Set the locale to one supporting Unicode
The default locale of a C program is C
, also called POSIX
, which is only meant for ASCII. Using setlocale
, it is sometimes necessary to explicitly set the appropriate locales to codepages that support Unicode. C.UTF-8
is portable.
setlocale(LC_ALL,"C.UTF-8");
setlocale(LC_CTYPE,"C.UTF-8");
3.为宽字符使用正确的类型
getwc
的返回值不是 char
, int
甚至不是 wchar_t
,它是wint_t
.确保您的字符变量 c
的类型为 wint_t
,以避免出现内存问题.
3. Use the proper type for wide characters
The return value of getwc
isn't char
, int
or even wchar_t
, it's wint_t
. Make sure that your character variable c
is of type wint_t
, to avoid memory problems.
这篇关于尝试读取宽字符会产生EOF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!