从Windows-1252升级到UCS-2 [英] Upgrade from Windows-1252 to UCS-2

查看：68 发布时间：2019/6/6 10:50:52 c

本文介绍了从Windows-1252升级到UCS-2的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在试图找出从Windows-1252（WindowsANSI

代码页）到UCS-2。目前该程序在Windows-1252中读取和写入编码的文件，但也应该能够读取用UCS-2编码的文件。

因为我不喜欢我不想在程序中处理两个字符表示

我打算在内部使用UCS-2。我应该可以简单地使用

std :: wstring然后呢？当读取Windows-1252编码文件时，我必须将
转换为UCS-2。我的理解是，如果支持什么样的

转换，它现在取决于C ++标准库的实现？我可能需要使用第三方库，比如

Dinkum Conversions Library，它可以动态转换数据或者像UTF-8 CPP那样的
，我可以明确地调用函数转换

字符集？

将所有内容转换为UCS-2并将其存储在std :: wstring后我

假设我可以使用众所周知的字符串函数来搜索，替换，

比较字符串（包括<和>）等。我的理解是正确的

我'只要使用的字符

设置不是多字节的，就可以安全地使用std :: wstring的成员函数吗？

最后但并非最不重要的是程序需要再次保存文件。在这里使用UTF-8可能会产生向后兼容性（因为其他程序如果他们只支持

可能更容易读取文件）
Windows-1252）。因此我需要另一个转换器来确保

std :: wstring是否正确编码为UTF-8，这意味着我需要再次使用

第三方工具？ />

我可能错过了什么？

鲍里斯

解决方案

Boris写道：

我正在试图找出升级程序的步骤是什么

（在Windows上使用）和Windows）从Windows-1252（Windows

" ANSI"代码页）到UCS-2。目前该程序读取和写入在Windows-1252中编码的文件

，但也应该能够读取在
UCS-2中编码的文件。

由于我不想在

计划中处理两个字符表示，我计划在内部使用UCS-2。我应该可以简单地使用

std :: wstring然后呢？

是。

当读取Windows-1252编码文件时，我必须

虽然将数据转换为UCS-2。我的理解是，如果支持
转换支持，那么它现在依赖于实现C ++标准库？
我可能需要使用第三方库

像Dinkum Conversions Library一样可以动态转换数据或

类似于UTF-8 CPP我可以在其中明确调用函数到

在字符集之间转换？

AFAIK第三方库（或编写自己的代码）是

的唯一途径。对于Windows-1252到UCS-2，为什么不编写自己的？难道不是很难。

>

将所有内容转换为UCS-2并将其存储在std :: wstring中我想b $ b假设我可以使用众所周知的字符串函数来搜索，替换，

比较字符串（包括<和>）我的理解是否正确

我可以安全地使用std :: wstring的成员函数，只要使用的

字符集不是多字节的？

这对于UCS-2是正确的。

>

最后但并非最不重要的是程序需要再次保存文件。在这里使用UTF-8可能会产生向后兼容性（因为其他程序如果他们只支持

可能更容易读取文件）
Windows-1252）。因此，我需要另一个转换器，以确保

std :: wstring正确编码为UTF-8，这意味着我需要再次使用

第三方工具？

我认为有些混乱，UTF-8和Windows-1252不一样。

第一个是字符编码，第二个是字符集。

但是，将UCS-2转换为UTF-8是另一个步骤，你可以获得第二方
库或编写自己的代码。

>

我可能错过了什么？

鲍里斯

john

>

我认为有些混乱，UTF-8和Windows-1252不一样。

第一个是字符编码，第二个是字符集。

我想把它取回来，Windows 1252也是一个包裹，但它仍然是

如果它与UTF-8不一样

john

2007年6月20日星期三15： 35:25 +0900，John Harrison

< jo ************* @ hotmail.comwrote：

< blockquote class =post_quotes>
>我认为这里有些混乱，UTF-8和Windows-1252不一样。
第一个是字符编码，第二个是字符集。

我想把它拿回来，Windows 1252也是一个编码，但它仍然是

，它与UTF-8
不一样

谢谢，约翰！我应该更清楚地澄清一下：这个想法是，当用UTF-8编码的
时，带有ASCII兼容的UTF-8子集的文件看起来像普通的ASCII文件（所以其他程序可以简单地假设它们是ASCII

文件）。

鲍里斯

I''m trying to find out what the steps look like to upgrade a program
(which is used on Windows and Unix) from Windows-1252 (the Windows "ANSI"
code page) to UCS-2. Currently the program reads and writes files encoded
in Windows-1252 but should be able to read files encoded in UCS-2, too.

As I don''t want to deal with two character representations in the program
I plan to use UCS-2 internally. I should be able to simply use
std::wstring then? When Windows-1252 encoded files are read I have to
convert the data to UCS-2 though. My understanding is that it depends now
on the implementation of the C++ standard library if and what kind of
conversions are supported? I might need to use a third-party library like
the Dinkum Conversions Library which converts data on the fly or something
like UTF-8 CPP where I can call functions explicitly to convert between
character sets?

After converting everything to UCS-2 and storing it in std::wstring I
suppose I can use the well-known string functions to search, replace,
compare strings (including < and >) etc. Is my understanding correct that
I''m safe to use member functions of std::wstring as long as the character
set used is not multibyte?

Last but not least the program needs to save files again. It might make
sense to use UTF-8 here for backward compatibility (as other programs
might be able to read the files more easily if they support only
Windows-1252). Thus I would need another converter to make sure that
std::wstring is encoded in UTF-8 correctly which means I need a
third-party tool again?

Anything I might have missed?

Boris

解决方案

Boris wrote:
I''m trying to find out what the steps look like to upgrade a program
(which is used on Windows and Unix) from Windows-1252 (the Windows
"ANSI" code page) to UCS-2. Currently the program reads and writes files
encoded in Windows-1252 but should be able to read files encoded in
UCS-2, too.

As I don''t want to deal with two character representations in the
program I plan to use UCS-2 internally. I should be able to simply use
std::wstring then?
Yes.

When Windows-1252 encoded files are read I have to
convert the data to UCS-2 though. My understanding is that it depends
now on the implementation of the C++ standard library if and what kind
of conversions are supported? I might need to use a third-party library
like the Dinkum Conversions Library which converts data on the fly or
something like UTF-8 CPP where I can call functions explicitly to
convert between character sets?
AFAIK a third party library (or writing your own code) is the only way
to go. For Windows-1252 to UCS-2 why not write your own? It can''t be
that hard.

>
After converting everything to UCS-2 and storing it in std::wstring I
suppose I can use the well-known string functions to search, replace,
compare strings (including < and >) etc. Is my understanding correct
that I''m safe to use member functions of std::wstring as long as the
character set used is not multibyte?
That''s correct for UCS-2.

>
Last but not least the program needs to save files again. It might make
sense to use UTF-8 here for backward compatibility (as other programs
might be able to read the files more easily if they support only
Windows-1252). Thus I would need another converter to make sure that
std::wstring is encoded in UTF-8 correctly which means I need a
third-party tool again?
Some confusion here I think, UTF-8 and Windows-1252 are not the same.
The first is an character encoding, the second is a character set.

But yes, to convert UCS-2 to UTF-8 is another step for which you could
either get a third party library or write your own code.

>
Anything I might have missed?

Boris
john

>
Some confusion here I think, UTF-8 and Windows-1252 are not the same.
The first is an character encoding, the second is a character set.

I want to take that back, Windows 1252 is an encding too, but it''s still
the case that it''s not the same as UTF-8

john

On Wed, 20 Jun 2007 15:35:25 +0900, John Harrison
<jo*************@hotmail.comwrote:

> Some confusion here I think, UTF-8 and Windows-1252 are not the same.
The first is an character encoding, the second is a character set.

I want to take that back, Windows 1252 is an encding too, but it''s still
the case that it''s not the same as UTF-8
Thanks, John! I should have clarified it better: The idea is that files
with an ASCII-compatible subset of UTF-8 look like normal ASCII files when
encoded in UTF-8 (so other programs can simply assume they are ASCII
files).

Boris

这篇关于从Windows-1252升级到UCS-2的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Windows-1252升级到UCS-2 [英] Upgrade from Windows-1252 to UCS-2

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

从Windows-1252升级到UCS-2 [英] Upgrade from Windows-1252 to UCS-2

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭