C / C ++ UTF-8大/小写转换 [英] C / C++ UTF-8 upper/lower case conversions

查看：251 发布时间：2016/8/18 14:22:01 c++ c utf-8 case-conversion

本文介绍了C / C ++ UTF-8大/小写转换的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

问题：
有与一台机器上工作，并未能对其他（下面详细介绍），相应的测试用例的方法。我认为有一些错误的code，导致它在一台机器上工作的机会。不幸的是，我不能发现问题。

The Problem: There is a method with a corresponding test-case that works on one machine and fails on the other (details below). I assume there's something wrong with the code, causing it to work by chance on the one machine. Unfortunately I cannot find the problem.

请注意，性病::字符串和UTF-8编码的使用是要求我有没有真正的影响力。使用C ++的方法是完全正常的，但不幸的是我没能发现任何东西。因此，使用C-功能。

Please note that the usage of std::string and utf-8 encoding are requirements I have no real influence on. Using C++ methods would be totally fine, but unfortunately I failed to find anything. Hence the use of C-functions.

方法：

std::string firstCharToUpperUtf8(const string& orig) {
  std::string retVal;
  retVal.reserve(orig.size());
  std::mbstate_t state = std::mbstate_t();
  char buf[MB_CUR_MAX + 1];
  size_t i = 0;
  if (orig.size() > 0) {
    if (orig[i] > 0) {
      retVal += toupper(orig[i]);
      ++i;
    } else {
      wchar_t wChar;
      int len = mbrtowc(&wChar, &orig[i], MB_CUR_MAX, &state);
      // If this assertion fails, there is an invalid multi-byte character.
      // However, this usually means that the locale is not utf8.
      // Note that the default locale is always C. Main classes need to set them
      // To utf8, even if the system's default is utf8 already.
      assert(len > 0 && len <= static_cast<int>(MB_CUR_MAX));
      i += len;
      int ret = wcrtomb(buf, towupper(wChar), &state);
      assert(ret > 0 && ret <= static_cast<int>(MB_CUR_MAX));
      buf[ret] = 0;
      retVal += buf;
    }
  }
  for (; i < orig.size(); ++i) {
    retVal += orig[i];
  }
  return retVal;
}

测试：

TEST(StringUtilsTest, firstCharToUpperUtf8) {
  setlocale(LC_CTYPE, "en_US.utf8");
  ASSERT_EQ("Foo", firstCharToUpperUtf8("foo"));
  ASSERT_EQ("Foo", firstCharToUpperUtf8("Foo"));
  ASSERT_EQ("#foo", firstCharToUpperUtf8("#foo"));
  ASSERT_EQ("ßfoo", firstCharToUpperUtf8("ßfoo"));
  ASSERT_EQ("Éfoo", firstCharToUpperUtf8("éfoo"));
  ASSERT_EQ("Éfoo", firstCharToUpperUtf8("Éfoo"));
}

失败的测试（只发生在两台机器之一）：

The failed test (only happens on one of two machines):

Failure
Value of: firstCharToUpperUtf8("ßfoo")
  Actual: "\xE1\xBA\x9E" "foo"
Expected: "ßfoo"

这两个机安装的语言环境en_US.utf8。然而，他们使用不同版本的libc。它的工作原理与GLIBC_2.14独立它被编译并在另一台机器上不能正常工作，而它只能有编制的机器上，否则它缺乏正确的libc版本。

Both machine have the locale en_US.utf8 installed. They however use different versions of libc. It works on the machine with GLIBC_2.14 independent of where it was compiled and doesn't work on the other machine, while it can only be compiled there, because otherwise it lacks the proper libc version.

无论哪种方式，存在编译此code和同时失败运行它的机器。必须有一些错误的code和我不知道。指向C ++的方法（特别是STL），也将是巨大的。升压和其他图书馆应该避免由于其他外部的要求。

Either way, there is a machine that compiles this code and runs it while it fails. There has to be something wrong with the code and I wonder what. Pointing to C++ methods (STL in particular), would also be great. Boost and other libraries should be avoided due to other outside requirements.

C / C ++ UTF-8大/小写转换 [英] C / C++ UTF-8 upper/lower case conversions

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

C / C ++ UTF-8大/小写转换 [英] C / C++ UTF-8 upper/lower case conversions

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭