从Windows-1252升级到UCS-2 [英] Upgrade from Windows-1252 to UCS-2

查看:68
本文介绍了从Windows-1252升级到UCS-2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试图找出从Windows-1252(WindowsANSI

代码页)到UCS-2。目前该程序在Windows-1252中读取和写入编码的文件,但也应该能够读取用UCS-2编码的文件。


因为我不喜欢我不想在程序中处理两个字符表示

我打算在内部使用UCS-2。我应该可以简单地使用

std :: wstring然后呢?当读取Windows-1252编码文件时,我必须将
转换为UCS-2。我的理解是,如果支持什么样的

转换,它现在取决于C ++标准库的实现?我可能需要使用第三方库,比如

Dinkum Conversions Library,它可以动态转换数据或者像UTF-8 CPP那样的
,我可以明确地调用函数转换

字符集?


将所有内容转换为UCS-2并将其存储在std :: wstring后我

假设我可以使用众所周知的字符串函数来搜索,替换,

比较字符串(包括<和>)等。我的理解是正确的

我'只要使用的字符

设置不是多字节的,就可以安全地使用std :: wstring的成员函数吗?


最后但并非最不重要的是程序需要再次保存文件。在这里使用UTF-8可能会产生向后兼容性(因为其他程序如果他们只支持

可能更容易读取文件)
Windows-1252)。因此我需要另一个转换器来确保

std :: wstring是否正确编码为UTF-8,这意味着我需要再次使用

第三方工具? />

我可能错过了什么?


鲍里斯

解决方案

Boris写道:


我正在试图找出升级程序的步骤是什么

(在Windows上使用)和Windows)从Windows-1252(Windows

" ANSI"代码页)到UCS-2。目前该程序读取和写入在Windows-1252中编码的文件

,但也应该能够读取在
UCS-2中编码的文件。

由于我不想在

计划中处理两个字符表示,我计划在内部使用UCS-2。我应该可以简单地使用

std :: wstring然后呢?



是。


当读取Windows-1252编码文件时,我必须


虽然将数据转换为UCS-2。我的理解是,如果支持
转换支持,那么它现在依赖于实现C ++标准库?
我可能需要使用第三方库

像Dinkum Conversions Library一样可以动态转换数据或

类似于UTF-8 CPP我可以在其中明确调用函数到

在字符集之间转换?



AFAIK第三方库(或编写自己的代码)是

的唯一途径。对于Windows-1252到UCS-2,为什么不编写自己的?难道不是很难。


>

将所有内容转换为UCS-2并将其存储在std :: wstring中我想b $ b假设我可以使用众所周知的字符串函数来搜索,替换,

比较字符串(包括<和>)我的理解是否正确

我可以安全地使用std :: wstring的成员函数,只要使用的

字符集不是多字节的?



这对于UCS-2是正确的。


>

最后但并非最不重要的是程序需要再次保存文件。在这里使用UTF-8可能会产生向后兼容性(因为其他程序如果他们只支持

可能更容易读取文件)
Windows-1252)。因此,我需要另一个转换器,以确保

std :: wstring正确编码为UTF-8,这意味着我需要再次使用

第三方工具?



我认为有些混乱,UTF-8和Windows-1252不一样。

第一个是字符编码,第二个是字符集。


但是,将UCS-2转换为UTF-8是另一个步骤,你可以获得第二方
库或编写自己的代码。


>

我可能错过了什么?


鲍里斯



john


>


我认为有些混乱,UTF-8和Windows-1252不一样。

第一个是字符编码,第二个是字符集。



我想把它取回来,Windows 1252也是一个包裹,但它仍然是

如果它与UTF-8不一样

john


2007年6月20日星期三15: 35:25 +0900,John Harrison

< jo ************* @ hotmail.comwrote:

< blockquote class =post_quotes>
>我认为这里有些混乱,UTF-8和Windows-1252不一样。
第一个是字符编码,第二个是字符集。



我想把它拿回来,Windows 1252也是一个编码,但它仍然是

,它与UTF-8
不一样



谢谢,约翰!我应该更清楚地澄清一下:这个想法是,当用UTF-8编码的
时,带有ASCII兼容的UTF-8子集的文件看起来像普通的ASCII文件(所以其他程序可以简单地假设它们是ASCII

文件)。


鲍里斯


I''m trying to find out what the steps look like to upgrade a program
(which is used on Windows and Unix) from Windows-1252 (the Windows "ANSI"
code page) to UCS-2. Currently the program reads and writes files encoded
in Windows-1252 but should be able to read files encoded in UCS-2, too.

As I don''t want to deal with two character representations in the program
I plan to use UCS-2 internally. I should be able to simply use
std::wstring then? When Windows-1252 encoded files are read I have to
convert the data to UCS-2 though. My understanding is that it depends now
on the implementation of the C++ standard library if and what kind of
conversions are supported? I might need to use a third-party library like
the Dinkum Conversions Library which converts data on the fly or something
like UTF-8 CPP where I can call functions explicitly to convert between
character sets?

After converting everything to UCS-2 and storing it in std::wstring I
suppose I can use the well-known string functions to search, replace,
compare strings (including < and >) etc. Is my understanding correct that
I''m safe to use member functions of std::wstring as long as the character
set used is not multibyte?

Last but not least the program needs to save files again. It might make
sense to use UTF-8 here for backward compatibility (as other programs
might be able to read the files more easily if they support only
Windows-1252). Thus I would need another converter to make sure that
std::wstring is encoded in UTF-8 correctly which means I need a
third-party tool again?

Anything I might have missed?

Boris

解决方案

Boris wrote:

I''m trying to find out what the steps look like to upgrade a program
(which is used on Windows and Unix) from Windows-1252 (the Windows
"ANSI" code page) to UCS-2. Currently the program reads and writes files
encoded in Windows-1252 but should be able to read files encoded in
UCS-2, too.

As I don''t want to deal with two character representations in the
program I plan to use UCS-2 internally. I should be able to simply use
std::wstring then?

Yes.

When Windows-1252 encoded files are read I have to

convert the data to UCS-2 though. My understanding is that it depends
now on the implementation of the C++ standard library if and what kind
of conversions are supported? I might need to use a third-party library
like the Dinkum Conversions Library which converts data on the fly or
something like UTF-8 CPP where I can call functions explicitly to
convert between character sets?

AFAIK a third party library (or writing your own code) is the only way
to go. For Windows-1252 to UCS-2 why not write your own? It can''t be
that hard.

>
After converting everything to UCS-2 and storing it in std::wstring I
suppose I can use the well-known string functions to search, replace,
compare strings (including < and >) etc. Is my understanding correct
that I''m safe to use member functions of std::wstring as long as the
character set used is not multibyte?

That''s correct for UCS-2.

>
Last but not least the program needs to save files again. It might make
sense to use UTF-8 here for backward compatibility (as other programs
might be able to read the files more easily if they support only
Windows-1252). Thus I would need another converter to make sure that
std::wstring is encoded in UTF-8 correctly which means I need a
third-party tool again?

Some confusion here I think, UTF-8 and Windows-1252 are not the same.
The first is an character encoding, the second is a character set.

But yes, to convert UCS-2 to UTF-8 is another step for which you could
either get a third party library or write your own code.

>
Anything I might have missed?

Boris

john


>

Some confusion here I think, UTF-8 and Windows-1252 are not the same.
The first is an character encoding, the second is a character set.

I want to take that back, Windows 1252 is an encding too, but it''s still
the case that it''s not the same as UTF-8

john


On Wed, 20 Jun 2007 15:35:25 +0900, John Harrison
<jo*************@hotmail.comwrote:

> Some confusion here I think, UTF-8 and Windows-1252 are not the same.
The first is an character encoding, the second is a character set.


I want to take that back, Windows 1252 is an encding too, but it''s still
the case that it''s not the same as UTF-8

Thanks, John! I should have clarified it better: The idea is that files
with an ASCII-compatible subset of UTF-8 look like normal ASCII files when
encoded in UTF-8 (so other programs can simply assume they are ASCII
files).

Boris


这篇关于从Windows-1252升级到UCS-2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆