通过char读取UTF-16 CSV文件 [英] Reading a UTF-16 CSV file by char
问题描述
目前,我想通过char读取一个UTF-16编码的CSV文件char,并将每个字符转换为ascii,以便我可以处理它。我后来计划把我处理的数据改回UTF-16,但这是除了现在的点。
Currently I am trying to read a UTF-16 encoded CSV file char by char, and convert each char into ascii so I can process it. I later plan to change my processed data back to UTF-16 but that is besides the point right now.
我知道我的蝙蝠我做的完全错了,因为我从来没有尝试过这样的事情:
I know right off the bat I am doing this completely wrong, as I have never attempted anything like this before:
int main(void)
{
FILE *fp;
int ch;
if(!(fp = fopen("x.csv", "r"))) return 1;
while(ch != EOF)
{
ch = fgetc(fp);
ch = (wchar_t) ch;
ch = (char) ch;
printf("%c", ch);
}
fclose(fp);
return 0;
}
想想,我希望这种工作的魔法有一些原因,但不是这样的。如何读取UTF-16 CSV文件并将其转换为ASCII?我的猜测是,因为每个utf-16字符是两个字节(我想?)我将不得不从文件读取两个字节到一个数据类型的变量,我不知道。然后我想我将不得不检查这个变量的位,以确保它是有效的ascii和转换它从那里?
Wishfully thinking, I was hoping that that work by magic for some reason but that was not the case. How can I read a UTF-16 CSV file and convert it to ascii? My guess is since each utf-16 char is two bytes (i think?) I'm going to have to read two bytes at a time from the file into a variable of some datatype which I am not sure of. Then I guess I will have to check the bits of this variable to make sure it is valid ascii and convert it from there? I don't know how I would do this though and any help would be great.
推荐答案
您应该使用 fgetwc
。以下代码应在存在字节顺序标记的情况下运行,并且可用的区域设置名为 en_US.UTF-16
。
You should use fgetwc
. The below code should work in the presence of a byte-order mark, and an available locale named en_US.UTF-16
.
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
main() {
setlocale(LC_ALL, "en_US.UTF-16");
FILE *fp = fopen("x.csv", "rb");
if (fp) {
int order = fgetc(fp) == 0xFE;
order = fgetc(fp) == 0xFF;
wint_t ch;
while ((ch = fgetwc(fp)) != WEOF) {
putchar(order ? ch >> 8 : ch);
}
putchar('\n');
fclose(fp);
return 0;
} else {
perror("opening x.csv");
return 1;
}
}
这篇关于通过char读取UTF-16 CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!