从文件中的C时的读数UNI code字 [英] Reading unicode characters from file in C

查看:166
本文介绍了从文件中的C时的读数UNI code字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从文件中读取的Uni code字符。我需要他们做的唯一一件事是提取他们的Uni code号。我使用codeBLOCK称为Mingw在Windows XP上运行。

I need to read Unicode characters from a file. The only thing I need to do from them is to extract their Unicode number. I am running on Windows XP using CodeBlock Mingw .

我在做这样的事情。

#define UNICODE
#ifdef UNICODE
#define _UNICODE
#else
#define _MBCS
#endif

    #include <stdio.h>
    #include <stdlib.h>
    #include <wchar.h>
    int main()
    {
        wchar_t *filename=L"testunicode.txt";
        FILE *infile;
        infile=_wfopen(filename,L"r");
        wchar_t result=fgetwc(infile);
        wprintf(L"%d",result);//To verify the unicode of character stored in file,print it   
        return 0;
    }

我得到的结果为255的所有时间。

I am getting result as 255 all the time.

testuni code.txt存储在编码=统一code(通过记事本创建)

testunicode.txt is stored in Encoding=Unicode (Created via notepad)

的最终任务是从中可以从任何语言包含字符的文件中读取,但wchar_t为2个字节只有这样才有能够得到UNI code为语言的所有可能的字符?

The final task is to read from a file which can contain characters from any language but wchar_t is 2 byte only so will it be able to get unicode for all possible characters of languages?

需要你的帮助......

Need your help...

谢谢大家对您的回复。

现在我已经改变了code。

Now I have changed the code.

#define UNICODE
#ifdef UNICODE
#define _UNICODE
#else
#define _MBCS
#endif

#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
int main()
{
    wchar_t *filename=L"testunicode.txt";
    FILE *infile;
    infile=_wfopen(filename,L"r");
    wchar_t  b[2];
    fread(b,2,2,infile);//Read a character from the file
    wprintf(L"%d",b[1]);
    return 0;
}

它打印出正确的UTF 16 code。它将被用于该项目需要读取来自世界不同语言的字符。因此,将UTF-16将后缀或我们应该改变存储文件的编码设置为UTF-32?此外,这里wchar_t为2个字节,对于UTF-32,我们需要4个字节的一些数据类型。如何实现这一目标?

It prints correct UTF 16 code. The project where it will be used requires to read characters from different languages of the world. So will UTF-16 will suffix or should we change the encoding of stored files to UTF-32? Also, here wchar_t is 2 bytes, for UTF-32 we need some data type with 4 bytes. How to accomplish that?

您的回复再次感谢........

Thanks again for your reply........

推荐答案

那么,在你的问题中code只读取文件的第一个字符,所以你必须实现某种形式的循环,才能构建以处理该文件的全部内容。

Well, the code in your question only reads the first character of your file, so you will have to implement some kind of looping construct in order to process the whole contents of that file.

现在, fgetwc()将返回 255 0xFF的)的原因有三:

Now, fgetwc() is returning 255 (0xFF) for three reasons:


  • 你不采取字节顺序标记文件的考虑,所以你最终读它,而不是实际的文件内容,

  • You're not taking the byte-order mark of the file into account, so you end up reading it instead of the actual file contents,

你不是指定的模式参数的转换模式标志的 _ wfopen(),所以它默认为文本 fgetwc()据此试图读取一个多字节字符,而不是一个宽字符的,

You're not specifying a translation mode flag in the mode argument to _wfopen(), so it defaults to text and fgetwc() accordingly tries to read a multibyte character instead of a wide character,

0xFF的(一的小尾数 UTF-16 BOM )可能是不是在你的程序的当前code页面前导字节,因此 fgetwc()返回它没有进一步的处理。

0xFF (the first byte of a little-endian UTF-16 BOM) is probably not a lead byte in your program's current code page, so fgetwc() returns it without further processing.

这篇关于从文件中的C时的读数UNI code字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆