C / C ++中的跨平台unicode:要使用哪种编码? [英] Cross-platform unicode in C/C++: Which encoding to use?

查看:142
本文介绍了C / C ++中的跨平台unicode:要使用哪种编码?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在从事一个业余项目(C / C ++),该项目应该可以在Windows和Linux上运行,并且完全支持Unicode。可悲的是,Windows和Linux使用不同的编码,这使我们的生活更加困难。



在我的代码中,我尝试使用尽可能通用的数据,以使两者都容易Windows和Linux。在Windows中,默认情况下wchar_t编码为UTF-16,在Linux中默认编码为UCS-4(如果我输入错,请更正我)。



我的软件打开({ _wfopen,UTF-16,Windows},{fopen,UTF-8,Linux}),然后将数据写入UTF-8中的文件。到目前为止,这一切都是可行的。直到我决定使用SQLite。



SQLite的C / C ++接口允许使用一字节或两字节编码的字符串(单击)。
当然,这在Linux中的wchar_t中不起作用,因为Linux中的wchar_t默认为4字节。因此,从sqlite进行写入和读取需要在Linux上进行转换。我希望坚持将数据存储在wchar_t中的标准思想:




  • Windows中的wchar_t:文件路径没有问题,可以读写sqlite没有问题。无论如何都应该将数据写入文件。

  • wchar_t:在Linux中:由于UTF-8编码导致文件路径异常,在读取/写入sqlite(wchar_t)之前进行转换



阅读后(此处),我确信我应该在Windows中坚持使用wchar_t。但是在所有这些工作之后,麻烦就开始了移植到Linux。



当前,我正在考虑将其全部重做以保留简单的char(UTF-8)因为它可同时在Windows和Linux上使用,所以请记住我需要在Windows中的每个字符串 WideCharToMultiByte以实现UTF-8。使用简单的基于char *的字符串将大大减少Linux / Windows的异常数量。是否有关于仅在UTF-8中存储数据而不是使用wchar_t的想法的想法?

解决方案

在所有平台上都是UTF-8,对于Windows而言,将即时转换为UTF-16的实时策略是跨平台Unicode的常见策略。


I'm currently working on a hobby project (C/C++) which is supposed to work on both Windows and Linux, with full support for Unicode. Sadly, Windows and Linux use different encodings making our lives more difficult.

In my code I'm trying to use the data as universal as possible, making it easy for both Windows and Linux. In Windows, wchar_t is encoded as UTF-16 by default, and as UCS-4 in Linux (correct me if I'm wrong).

My software opens ({_wfopen, UTF-16, Windows},{fopen, UTF-8, Linux}) and writes data to files in UTF-8. So far it's all doable. Until I decided to use SQLite.

SQLite's C/C++ interface allows for one or two-byte encoded strings (click). Ofcourse this does not work with wchar_t in Linux, as the wchar_t in Linux is 4 bytes by default. Therefore, writing and reading from sqlite requires conversion for Linux.

Currently the code is cluttering up with exceptions for Windows/Linux. I was hoping to stick to the standard idea of storing data in wchar_t:

  • wchar_t in Windows: Filepaths without a problem, reading/writing to sqlite without a problem. Writing data to a file should be done in UTF-8 anyway.
  • wchar_t in Linux: Exception for the filepaths due to UTF-8 encoding, conversion before reading/writing to sqlite (wchar_t), and the same for windows when writing data to a file.

After reading (here) I was convinced I should stick to wchar_t in Windows. But after getting all that to work, the trouble began with porting to Linux.

Currently I'm thinking of redoing it all to stick with simple char(UTF-8) because it works with both Windows and Linux, keeping the fact in mind that I need to 'WideCharToMultiByte' every string in Windows to achieve UTF-8. Using simple char* based strings will greatly reduce the number of exceptions for Linux/Windows.

Do you have any experience with unicode for cross-platform? Any thoughts about the idea of simply storing data in UTF-8 instead of using wchar_t?

解决方案

UTF-8 on all platforms, with just-in-time conversion to UTF-16 for Windows is a common tactic for cross-platform Unicode.

这篇关于C / C ++中的跨平台unicode:要使用哪种编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆