如何在C ++中使用fstream将自定义字符输入到文件中 [英] How to input a custom character to a file using fstream in C++

查看:145
本文介绍了如何在C ++中使用fstream将自定义字符输入到文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以告诉我有没有可能的方法输入一个自定义字符ซ到一个文件,我尝试了但是通过我的程序输入后打开文件我碰巧发现它转换为ascii(VT) 。



我尝试了什么:



这里是一块代码我打开文件并输入我的字符:

can anyone tell me is there a possible way to input a custom character says ซ to a file, well i tried it out but after inputting it through my program and opening the file i happen to find it converted to ascii (VT).

What I have tried:

here is the piece of code i open file up with and input my char:

fstream write_on;
fstream Read_From;
const char *Write_File_Name = "C:\\users\\Username\\Desktop\\pic1.txt";
wchar_t const buf[] = L"ซ";

write_on.open(Write_File_Name, ios::binary | ios::out);
write_on.write(buf, 1);
write_on.close();





谢谢



thanks

推荐答案

除了解决方案1:



请参阅我对该问题的评论。



你需要了解 wchar_t 依赖于实现的。特别是在Windows中,它使用UTF-16L(Unicode编码之一)定向到字符表示。从形式上讲,它不必是任何特定的编码;它可能只是这种类型给定大小的一些任意数据。一切都取决于如何解释这些数据。



这意味着带有BMP的代码点的一个字符表示为两个字节,而其他字符使用一对16位字,称为代理对。因此,该字符具有2或4字节表示;你的情况,如果2个字节,所以你不需要一个 wchar_t 的数组,但一般情况下你需要它。然后你需要将这个数组的所有元素写入你的文件并相应地阅读。



参见:

广泛的角色 - 维基百科,免费的百科全书

BMP(Unicode) - 维基百科,免费的百科全书

BMP路线图

https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html

常见问题 - UTF-8,UTF-16,UTF-32& BOM [ ^ ]。



-SA
In addition to Solution 1:

Please see my comment to the question.

You need to understand that wchar_t is implementation-dependent. In Windows, in particular, it is oriented to the character representation using UTF-16L, one of the Unicode encodings. Formally speaking, it does not have to be any particular encoding; it could be just some arbitrary data of the given size of this type. The everything depends on how this data is interpreted.

It means that one character with code point withing BMP is represented as two bytes, and the other characters use a pair of 16-bit words, called surrogate pair. So, the character had 2 or 4-byte representation; your case if 2 bytes, so you don't need an array of wchar_t, but in general case you would need it. Then you would need to write all the elements of this array to your file and read accordingly.

See also:
Wide character — Wikipedia, the free encyclopedia,
BMP (Unicode) — Wikipedia, the free encyclopedia,
Roadmap to the BMP,
https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html,
FAQ - UTF-8, UTF-16, UTF-32 & BOM[^].

—SA


我认为它不会转换为ASCII。但是你只需要一个字节到文件(你的宽字符的低字节)。要写入所有字符字节,请使用:

I don't think it is converted to ASCII. But you are wirting only one byte to the file (the lower byte of your wide character). To write all character bytes use:
write_on.write(buf, sizeof(wchar_t));







见somments。通用的解决方案应该是:




See somments. The universal solution should be:

write_on.write((char*)buf, wcslen(buf) * sizeof(wchar_t));





使用示例字符串ซ将单个Unicode字符(泰语字符SO SO)写入文件。假设您的平台对宽字符(如Windows)使用UTF-16LE编码,Unicode代码点为0x0E0B,二进制文件内容为0x0B,后跟0x0E。



稍后阅读此类文件时,您必须知道使用何种编码。或者更一般:您必须知道如何使用您想要读取的每个文件来解释文件内容。



如果您将文件解释为ASCII(或某些8位)文本),您将读取ASCII控制字符0x0B和0x0E(VT和SO)。但如果你把它解释为UTF16-LE,你会得到代码点0x0E0B。

[/ EDIT]



With your example string "ซ" a single Unicode character (the Thai character SO SO) is written to file. Assuming your platform uses UTF-16LE encoding for wide characters (like Windows), the Unicode code point is 0x0E0B and the binary file content will be 0x0B followed by 0x0E.

When reading such files later you must know what kind of encoding is used. Or more general: You must know how to interpret the file content with each file you want to read.

If you interpret the file as ASCII (or some 8-bit text), you will read the ASCII control characters 0x0B and 0x0E (VT and SO). But if you interpret it as UTF16-LE, you will get the code point 0x0E0B.
[/EDIT]


这篇关于如何在C ++中使用fstream将自定义字符输入到文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆