从UTF-8文本文件中读取日文文本 [英] Read japanese text from UTF-8 text file

查看:576
本文介绍了从UTF-8文本文件中读取日文文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个带有UTF-8编码的文本文件,在那个文件中我写了一些日文字符,现在我想读取这个文本文件并在控制台上显示以及将数据存储在另一个文件中。

I have created one text file with UTF-8 encoding, and in that file I written some Japanese characters, now I want to read this text file and display on console as well as store data in another file..

推荐答案

Microsoft SDK提供了两种在字符编码之间进行转换的函数:MultiByteToWideChar [ ^ ]和WideCharToMultiByte [ ^ ]。



为简化应用代码,你应该使用Unicode(这是最近的VisualStudio版本的默认设置)。



使用 MultiByteToWideChar 将UTF-8字符串转换为宽字符。要将其打印到控制台,可能需要将字符串转换为控制台使用的编码(调用 GetConsoleOutputCP [ ^ ])。当控制台使用的代码页无法打印日语字符时,您可以使用 SetConsoleOutputCP [ ^ ]。在所有情况下,您必须确保控制台使用的字体包含使用过的字符。



使用输出到文件,您可以自由使用任何编码。这主要取决于应该打开文件的应用程序。



[根据上面发布的评论编辑]

你可以看看在提示处理C / C ++中的简单文本文件 [ ^ 一个例子。

一般过程是:



  • 获取UTF-8文件的大小
  • 为UTF-8文本分配缓冲区
  • 打开文件,将内容读入缓冲区,关闭文件
  • 致电 MultiByteToWideChar with CP_UTF8 lpMultiByteStr =输入缓冲区, cbMultiByte =文件大小, lpWideCharStr = NULL, cchWideChar == 0获取长度buffer
  • 使用上面ca返回的值分配宽字符缓冲区ll
  • 再次调用 MultiByteToWideChar 再次传递输出缓冲区及其大小。
  • 使用宽字符串执行某些操作,例如打印到控制台
  • 如果不再需要,删除缓冲区
The Microsoft SDK provides two functions to convert between character encodings: MultiByteToWideChar[^] and WideCharToMultiByte[^].

To simplify the code of your app, you should make it using Unicode (which is the default with recent VisualStudio versions).

Use MultiByteToWideChar to convert an UTF-8 string to wide chars. To print this to the console, it may be necessary to convert the string to the encoding used by the console (call GetConsoleOutputCP[^]). When the code page used by the console is not able to print your Japanese characters, you may change the code page using SetConsoleOutputCP[^]. In all cases you must ensure that the font used by the console contains the used characters.

With output to file you are free to use any encoding. It depends mainly on the applications that should open the file.


You may have a look at the tip Handling simple text files in C/C++[^] for an example.
The general process is:

  • Get the size of the UTF-8 file
  • Allocate a buffer for the UTF-8 text
  • Open the file, read the content into the buffer, close the file
  • Call MultiByteToWideChar with CP_UTF8, lpMultiByteStr = input buffer, cbMultiByte = file size, lpWideCharStr = NULL, cchWideChar == 0 to get the length for the buffer
  • Allocate the wide char buffer using the value returned by the above call
  • Call MultiByteToWideChar again passing now the output buffer and it's size.
  • Do something with the wide string like printing to console
  • Delete the buffers if no longer needed


这篇关于从UTF-8文本文件中读取日文文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆