Linux& C编程:如何将utf-8编码的文本写入文件? [英] Linux & C-Programming: How can I write utf-8 encoded text to a file?

查看:530
本文介绍了Linux& C编程:如何将utf-8编码的文本写入文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣将utf-8编码的字符串写入文件.

I am interested in writing utf-8 encoded strings to a file.

我使用低级函数open()和write()做到了这一点. 首先,我将语言环境设置为utf-8感知字符集, setlocale("LC_ALL", "de_DE.utf8"). 但是生成的文件不包含utf-8字符,仅包含iso8859编码的变音符号.我在做什么错了?

I did this with low level functions open() and write(). In the first place I set the locale to a utf-8 aware character set with setlocale("LC_ALL", "de_DE.utf8"). But the resulting file does not contain utf-8 characters, only iso8859 encoded umlauts. What am I doing wrong?

附录:我不知道我的字符串是否真的是utf-8编码的.我只是将它们以这种形式保存在源文件中:char *msg = "Rote Grütze";

Addendum: I don't know if my strings are really utf-8 encoded in the first place. I just keep them in the source file in this form: char *msg = "Rote Grütze";

有关文本文件的内容,请参见屏幕截图: 替代文字http://img19.imageshack.us/img19/9791/picture1jh9.png

See screenshot for content of the textfile: alt text http://img19.imageshack.us/img19/9791/picture1jh9.png

推荐答案

更改语言环境不会更改使用write()写入文件的实际数据.您实际上必须产生个UTF-8字符才能将它们写入文件.为此,您可以将库用作 ICU .

Changing the locale won't change the actual data written to the file using write(). You have to actually produce UTF-8 characters to write them to a file. For that purpose you can use libraries as ICU.

在编辑问题后进行编辑:UTF-8字符仅在特殊"符号(ümlauts,áccénts等)上不同于ISO-8859.因此,对于所有没有任何这些符号的文本,两者都是等效的.但是,如果在程序字符串中包含这些符号,则必须确保文本编辑器将数据视为UTF-8.有时您只需要告诉它即可.

Edit after your edit of the question: UTF-8 characters are only different from ISO-8859 in the "special" symbols (ümlauts, áccénts, etc.). So, for all the text that doesn't have any of this symbols, both are equivalent. However, if you include in your program strings with those symbols, you have to make sure your text editor treats the data as UTF-8. Sometimes you just have to tell it to.

总而言之,如果源代码中的字符串使用UTF-8,则您生成的文本将使用UTF-8.

To sum up, the text you produce will be in UTF-8 if the strings within the source code are in UTF-8.

另一种编辑:可以肯定的是,您可以使用iconv将源代码转换为UTF-8:

Another edit: Just to be sure, you can convert your source code to UTF-8 using iconv:

iconv -f latin1 -t utf8 file.c

这会将您所有的latin-1字符串转换为utf8,并且在您打印它们时,它们肯定会使用UTF-8.如果iconv遇到一个奇怪的字符,或者您看到带有奇怪字符的输出字符串,则您的字符串已经在UTF-8中.

This will convert all your latin-1 strings to utf8, and when you print them they will be definitely in UTF-8. If iconv encounters a strange character, or you see the output strings with strange characters, then your strings were in UTF-8 already.

此致

这篇关于Linux& C编程:如何将utf-8编码的文本写入文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆