和UNI code,升压,C ++,codecvts难倒 [英] Stumped with Unicode, Boost, C++, codecvts

查看:207
本文介绍了和UNI code,升压,C ++,codecvts难倒的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C ++中,我想用统一code做的事情。因此倒下的Uni code的兔子洞后,我已经成功在混乱,头痛和语言环境火车残骸就结了。

In C++, I want to use Unicode to do things. So after falling down the rabbit hole of Unicode, I've managed to end up in a train wreck of confusion, headaches and locales.

但在加速我已经试图用统一code文件路径,并尝试使用与统一code输入Boost的程序选项库中的不幸的问题。我读过什么我能找到的语言环境中,codecvts,统一code编码和加速的主题。

But in Boost I've had the unfortunate problem of trying to use Unicode file paths and trying to use the Boost program options library with Unicode input. I've read whatever I could find on the subjects of locales, codecvts, Unicode encodings and Boost.

我目前试图把事情的工作是有一个codeCVT,需要一个UTF-8字符串并将其转换为平台的编码(对POSIX UTF-8,在Windows UTF-16),我已经一直试图避免 wchar_t的

My current attempt to get things to work is to have a codecvt that takes a UTF-8 string and converts it to the platform's encoding (UTF-8 on POSIX, UTF-16 on Windows), I've been trying to avoid wchar_t.

其实我已经得到了正在尝试Boost.Locale做到这一点,从一个UTF-8字符串转换为输出一个UTF-32串最接近的。

The closest I've actually gotten is trying to do this with Boost.Locale, to convert from a UTF-8 string to a UTF-32 string on output.

#include <string>
#include <boost/locale.hpp>
#include <locale>

int main(void)
{
  std::string data("Testing, 㤹");

  std::locale fromLoc = boost::locale::generator().generate("en_US.UTF-8");
  std::locale toLoc   = boost::locale::generator().generate("en_US.UTF-32");

  typedef std::codecvt<wchar_t, char, mbstate_t> cvtType;
  cvtType const* toCvt = &std::use_facet<cvtType>(toLoc);

  std::locale convLoc = std::locale(fromLoc, toCvt);

  std::cout.imbue(convLoc);
  std::cout << data << std::endl;

  // Output is unconverted -- what?

  return 0;
}

我觉得我有一些其他类型的转换使用宽字符的工作,但我真的不知道我在做连。我不知道这份工作的合适工具,在这一点上的东西。帮助?

I think I had some other kind of conversion working using wide characters, but I really don't know what I'm even doing. I don't know what the right tool for the job is at this point. Help?

推荐答案

好吧,长的几个月之后,我想通了,我想帮助人们在未来。

Okay, after a long few months I've figured it out, and I'd like to help people in the future.

首先,在codeCVT事情是这样做的错误的方式。 Boost.Locale提供了助推区域:: :: CONV命名字符集之间进行转换的简单方法。这里有一个例子(有没有基于区域设置等)。

First of all, the codecvt thing was the wrong way of doing it. Boost.Locale provides a simple way of converting between character sets in its boost::locale::conv namespace. Here's one example (there's others not based on locales).

#include <boost/locale.hpp>
namespace loc = boost::locale;

int main(void)
{
  loc::generator gen;
  std::locale blah = gen.generate("en_US.utf-32");

  std::string UTF8String = "Tésting!";
  // from_utf will also work with wide strings as it uses the character size
  // to detect the encoding.
  std::string converted = loc::conv::from_utf(UTF8String, blah);

  // Outputs a UTF-32 string.
  std::cout << converted << std::endl;

  return 0;
}

正如你所看到的,如果你更换的en_US.UTF-32和它会在用户的语言环境的输出。

As you can see, if you replace the "en_US.utf-32" with "" it'll output in the user's locale.

我还是不知道如何做的std ::法院做这一切的时候,但翻译()Boost.Locale输出功能在用户的语言环境。

I still don't know how to make std::cout do this all the time, but the translate() function of Boost.Locale outputs in the user's locale.

对于使用UTF-8字符串跨平台的文件系统,似乎这是可能的,这里是如何链接做的。

As for the filesystem using UTF-8 strings cross platform, it seems that that's possible, here's a link to how to do it.

这篇关于和UNI code,升压,C ++,codecvts难倒的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆