如何读取包含vc ++中的\uxxxx的文件 [英] How to read file which contains \uxxxx in vc++

查看:132
本文介绍了如何读取包含vc ++中的\uxxxx的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有txt档案,其内容是:

I have txt file whose contents are:


\\\П\\\е\\\р\\\в\\\ы\ u0439_\\\и\\\н\\\т\\\е\\\р\\\а\\\к\\\т\\\и\\\в\\\н\\\ы\\\й_\\\и\\\н\\\т\\ \\ u0435\\\р\\\н\\\е\\\т_\\\к\\\а\\\н\\\а\\\л

\u041f\u0435\u0440\u0432\u044b\u0439_\u0438\u043d\u0442\u0435\u0440\u0430\u043a\u0442\u0438\u0432\u043d\u044b\u0439_\u0438\u043d\u0442\u0435\u0440\u043d\u0435\u0442_\u043a\u0430\u043d\u0430\u043b

如何读取此类文件以获得如下结果:

How can I read such file to get result like this:


Первый_интерактивный_интернет_канал

"Первый_интерактивный_интернет_канал"

如果我输入:

string str = _T("\u041f\u0435\u0440\u0432\u044b\u0439_\u0438\u043d\u0442\u0435\u0440\u0430\u043a\u0442\u0438\u0432\u043d\u044b\u0439_\u0438\u043d\u0442\u0435\u0440\u043d\u0435\u0442_\u043a\u0430\u043d\u0430\u043b");

然后导致 str 我从文件中读取它,它是相同的,在文件中。我想这是因为'\u'变成'\u'。
是否有简单的方法将\uxxxx表示法转换为C ++中的相应符号?

then result in str is good but if I read it from file then it is the same like in file. I guess it is because '\u' becomes '\u'. Is there simple way to convert \uxxxx notation to corresponding symbols in C++?

推荐答案

MSalters的建议:

Here is an example for MSalters's suggestion:

#include <iostream>
#include <string>
#include <fstream>
#include <algorithm>
#include <sstream>
#include <iomanip>
#include <locale>

#include <boost/scoped_array.hpp>
#include <boost/regex.hpp>
#include <boost/numeric/conversion/cast.hpp>

std::wstring convert_unicode_escape_sequences(const std::string& source) {
  const boost::regex regex("\\\\u([0-9A-Fa-f]{4})");  // NB: no support for non-BMP characters
  boost::scoped_array<wchar_t> buffer(new wchar_t[source.size()]);
  wchar_t* const output_begin = buffer.get();
  wchar_t* output_iter = output_begin;
  std::string::const_iterator last_match = source.begin();
  for (boost::sregex_iterator input_iter(source.begin(), source.end(), regex), input_end; input_iter != input_end; ++input_iter) {
    const boost::smatch& match = *input_iter;
    output_iter = std::copy(match.prefix().first, match.prefix().second, output_iter);
    std::stringstream stream;
    stream << std::hex << match[1].str() << std::ends;
    unsigned int value;
    stream >> value;
    *output_iter++ = boost::numeric_cast<wchar_t>(value);
    last_match = match[0].second;
  }
  output_iter = std::copy(last_match, source.end(), output_iter);
  return std::wstring(output_begin, output_iter);
}

int wmain() {
  std::locale::global(std::locale(""));
  const std::wstring filename = L"test.txt";
  std::ifstream stream(filename.c_str(), std::ios::in | std::ios::binary);
  stream.seekg(0, std::ios::end);
  const std::ifstream::streampos size = stream.tellg();
  stream.seekg(0);
  boost::scoped_array<char> buffer(new char[size]);
  stream.read(buffer.get(), size);
  const std::string source(buffer.get(), size);
  const std::wstring result = convert_unicode_escape_sequences(source);
  std::wcout << result << std::endl;
}



我总是感到惊讶的是,看起来像这样简单的东西在C ++中有多么复杂。

I'm always surprised how complicated seemingly simple things like this are in C++.

这篇关于如何读取包含vc ++中的\uxxxx的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆