支持Unicode的可移植代码 [英] Portable Code that supports Unicode

查看：75 发布时间：2019/6/5 12:15:10 c

本文介绍了支持Unicode的可移植代码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

让我们开始吧：

班级国家{

公开：

virtual const char * GetName（）const = 0;

}

class挪威：public Nation {

public：

虚拟const char * GetName（）const

{

返回"挪威" ;;

}

};

假设我们想要用国家官方

语言给出这个国家的名字......所以我们想用Unicode字符集来实现这个目标。

如何在便携式代码中使用Unicode？类似

跟随？：

typedef wchar_t UnicodeChar;

class Nation {

public：

虚拟const UnicodeChar * GetName（）const = 0;

}

类挪威：public Nation {

public：

virtual const UnicodeChar * GetName（）const

{

return L" Norway" ;; //注意前面的L

}

};

你会使用wchar_t，还是会使用unsigned short; ？（Unicode是16-

位）。

此外，你如何以这样的方式制作你的代码

使用普通字符或宽字符。微软做了以下事情

如下:(如果你使用的是Unicode，你定义UNICODE宏）

#ifdef UNICODE

typedef wchar_t字符;

#define StringLiteral（x）Lx

#else

typedef char字符;

#define StringLiteral（x）x

#endif

class Nation {

public：

虚拟const字符* GetName（）const = 0;

}

类挪威：public Nation {

public：

虚拟const字符* GetName（）const

{

返回StringLiteral（" Norway"）;

}

};

您怎么看待这个？目前我正在编写我想要的代码

支持普通字符集和Unicode ...但是我想保留它

便携式！

关于如何解决这个问题的任何建议？微软的方式是不够的？

-Tomás

Let''s start off with:

class Nation {
public:
virtual const char* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const char* GetName() const
{
return "Norway";
}
};
Let''s say we want to give the name of the nation in the nation''s official
language... and so we want to use the Unicode character set to achieve this.

How does one go about using Unicode in portable code? Something like the
following?:

typedef wchar_t UnicodeChar;

class Nation {
public:
virtual const UnicodeChar* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const UnicodeChar* GetName() const
{
return L"Norway"; //Note the preceding L
}
};
Would you use "wchar_t", or would you use "unsigned short"? (Unicode is 16-
bit).

Furthermore, how do you go about making your code in such a way that it can
use either normal characters or wide characters. Microsoft do it something
like the following: (You define the UNICODE macro if you''re using Unicode)

#ifdef UNICODE
typedef wchar_t Character;
#define StringLiteral(x) Lx
#else
typedef char Character;
#define StringLiteral(x) x
#endif

class Nation {
public:
virtual const Character* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const Character* GetName() const
{
return StringLiteral("Norway");
}
};
What do you think of this? At the moment I''m writing code which I want to
support the normal character set and also Unicode... but I want to keep it
portable!

Any suggestions on how to go about this? Is the Microsoft way decent enough?

-Tomás

推荐答案

Tomás写道：

让我们从以下开始：

class Nation {
public：
virtual const char * GetName（）const = 0 ;
}

类挪威语：public Nation {
公共：
虚拟const char * GetName（）const
{
返回"挪威;
}
};

为什么你使用char *代替std :: basic_string< char_type>？

让我们说我们想要给出国家的名字在国家的官方语言中...所以我们希望使用Unicode字符集来实现这一目标。

WHICH unicode字符集？有几种，如UTF-8，

UTF-16，UTF-32，UCS-2，UCS-4以及大端和小端版本。

如何在便携式代码中使用Unicode？像
之类的东西？：

Unicode仍然不是标准的一部分，因此它不可移植。

typedef wchar_t UnicodeChar; <课程国家{
公共：
虚拟const UnicodeChar * GetName（）const = 0;
}
类挪威：public Nation {
public：
virtual const UnicodeChar * GetName（）const
{
返回L" Norway" ;; //注意前面的L
}
};

你会使用wchar_t，还是会使用unsigned short？（Unicode是16-
位）。

并非所有Unicode都是16位，并非所有16位编码都是Unicode。

wchar_t通常不适合Unicode。

直到我确定我在做什么，我可能会使用：

class unicode_char {

/ * wrap wchar_t * / < br $>
}

typedef std :: basic_string< unicode_char> ustring;

此外，你如何以一种可以使用普通字符或宽字符的方式制作代码。微软做了以下事情:(如果你使用Unicode，你定义UNICODE宏）

#ifdef UNICODE
typedef wchar_t字符;
＃ define StringLiteral（x）Lx
#else
typedef char Character;
#define StringLiteral（x）x
#endif

那个丑陋，不是一个被复制的模式。如果你需要Unicode

支持，只需支持Unicode。

无论如何，这只是一种支持宽窄字符的方式，

不编码。

类国家{
公开：
虚拟const字符* GetName（）const = 0;
}

挪威语：public Nation {
public：
virtual const Character * GetName（）const
返回StringLiteral（" Norway"）;
} };

您怎么看待这个？目前我正在编写代码，我想支持正常的字符集和Unicode ......但是我想保留它
便携！

任何建议关于如何去做？微软的方式是否足够好？

Let''s start off with:

class Nation {
public:
virtual const char* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const char* GetName() const
{
return "Norway";
}
};
Why are you using char* instead of std::basic_string<char_type>?
Let''s say we want to give the name of the nation in the nation''s official
language... and so we want to use the Unicode character set to achieve this.
WHICH unicode "character set"? There are several, such as UTF-8,
UTF-16, UTF-32, UCS-2, UCS-4 as well as big and little endian versions.
How does one go about using Unicode in portable code? Something like the
following?:
Unicode is still not part of the standard, so it is not portable.
typedef wchar_t UnicodeChar;

class Nation {
public:
virtual const UnicodeChar* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const UnicodeChar* GetName() const
{
return L"Norway"; //Note the preceding L
}
};
Would you use "wchar_t", or would you use "unsigned short"? (Unicode is 16-
bit).
Not all Unicode is 16 bit, and not all 16 bit encodings are Unicode.
wchar_t is often not suitable for Unicode.

Until I was sure what I was doing, I would probably use:

class unicode_char {
/* wrap wchar_t */
}

typedef std::basic_string<unicode_char> ustring;
Furthermore, how do you go about making your code in such a way that it can
use either normal characters or wide characters. Microsoft do it something
like the following: (You define the UNICODE macro if you''re using Unicode)

#ifdef UNICODE
typedef wchar_t Character;
#define StringLiteral(x) Lx
#else
typedef char Character;
#define StringLiteral(x) x
#endif
That''s ugly and is not a modal to be copied. If you need Unicode
support, just support Unicode.

Anyway, this is merely a way of supporting wide and narrow characters,
not encodings.
class Nation {
public:
virtual const Character* GetName() const = 0;
}

class Norway : public Nation {
public:
virtual const Character* GetName() const
{
return StringLiteral("Norway");
}
};
What do you think of this? At the moment I''m writing code which I want to
support the normal character set and also Unicode... but I want to keep it
portable!

Any suggestions on how to go about this? Is the Microsoft way decent enough?

我认为您需要确定它到底在做什么，并在Unicode上阅读

。

到目前为止，你只展示了广泛而狭隘的角色支持，并且

与编码无关。

您需要决定内部表示，然后您需要

为您选择的操作系统提供映射，可能通过流操作符

和facets。我不知道你对便携式产品的定义是什么。

Ben Pope

-

我不是只是一个数字。对很多人来说，我被称为字符串...

I think you need to decide what exactly it is you are doing, and read up
on Unicode.

So far you have only demonstrated wide and narrow character support, and
nothing to do with encodings.

You need to decide on an internal representation, and then you need to
provide mappings to your OS of choice, probably through stream operators
and facets. I don''t know what your definition of portable is.

Ben Pope
--
I''m not just a number. To many, I''m known as a string...

Tomas写道：

Tomas wrote:

（Unicode是16-
bit）。

(Unicode is 16-
bit).

Unicode定义为21位。

您可以使用各种编码来表示它，如UTF-8 ，UTF-16或

UTF-32别名UCS-4。

微软也使用UCS-2，但它不支持

整个Unicode范围。

如果您需要随机访问的东西，您只能使用UCS-2或UCS-4。

如果你只需要一个可逆容器，UTF-8或UTF-16就可以了。

无论如何你不应该使用指针作为字符串，而是字符串对象。

std :: wstring可以用于UCS-2或UCS-4，具体取决于你的系统。

要注意比标准中的更多，但是，std :: wstring wasn''为

unicode制作。你最好使用专门的IMO。

我不认为微软的UNICODE宏是一个好主意。这使得使用unicode支持编译的
库不兼容

不等等。

只需让你的应用程序识别unicode，编译混乱的旗帜

一切都没用。

我建议使用glibmm的Glib :: ustring。

它包含关于一般Unicode内容的一些很好的工具。

还有来自IBM的ICU你可以查看。

Unicode is defined on 21 bits.
You can use various encodings to represent it, like UTF-8, UTF-16 or
UTF-32 alias UCS-4.
There is also UCS-2 that Microsoft uses, but it doesn''t support the
whole Unicode range.

If you need something with Random Access, you can only take UCS-2 or UCS-4.
If you only need a Reversible Container, UTF-8 or UTF-16 will do.

Anyway you shouldn''t use pointers for strings, but strings objects.

std::wstring can be used for UCS-2 or UCS-4 depending on your system.
Be aware than in the standard, though, std::wstring wasn''t made for
unicode. You''d better use something dedicated IMO.

I don''t think the UNICODE macro of Microsoft is a good idea. That makes
libs compiled with unicode support incompatible with the ones which
aren''t etc.
Just make your application unicode aware, compiling flags to mess
everything up are useless.

I would advise to use Glib::ustring from glibmm.
It contains some nice tools about general Unicode stuff too.

There is also ICU from IBM that you could check out.

loufoque写道：

loufoque wrote:

Tomas写道：

Tomas wrote:

（Unicode是16-
位）。

Unicode定义为21位。
您可以使用各种编码来表示它，如UTF-8，UTF-16或
UTF-32别名UCS-4。
微软也使用UCS-2，但是它不支持整个Unicode范围。

如果您需要随机访问的东西，您只能使用UCS-2或UCS-4。
如果你只是需要一个可逆容器，UTF-8或U. TF-16会这样做。

(Unicode is 16-
bit).

Unicode is defined on 21 bits.
You can use various encodings to represent it, like UTF-8, UTF-16 or
UTF-32 alias UCS-4.
There is also UCS-2 that Microsoft uses, but it doesn''t support the
whole Unicode range.

If you need something with Random Access, you can only take UCS-2 or UCS-4.
If you only need a Reversible Container, UTF-8 or UTF-16 will do.

什么是可逆的？如果UTF-16是可逆的，那么那么必须是UTF-32。

无论如何你不应该使用指针来表示字符串，而是使用字符串对象。

std :: wstring可以用于UCS-2或UCS-4，具体取决于你的系统。
请注意，与标准相比，std :: wstring不是为了unicode而制作的。你最好使用专门的IMO。

我不认为微软的UNICODE宏是一个好主意。这使得使用unicode支持编译的库不兼容
等等。
只需让你的应用程序识别unicode，编译标记就会搞乱
一切都没用。

我是第二个。

UTF-16也是浪费时间恕我直言。

我会建议使用来自glibmm的Glib :: ustring。
它包含一些关于一般Unicode内容的好工具。

还有来自IBM的ICU，你可以查看。

What is "Reversible" ? If UTF-16 is "reversible" then so must be UTF-32.

Anyway you shouldn''t use pointers for strings, but strings objects.

std::wstring can be used for UCS-2 or UCS-4 depending on your system.
Be aware than in the standard, though, std::wstring wasn''t made for
unicode. You''d better use something dedicated IMO.

I don''t think the UNICODE macro of Microsoft is a good idea. That makes
libs compiled with unicode support incompatible with the ones which
aren''t etc.
Just make your application unicode aware, compiling flags to mess
everything up are useless.
I second that.

UTF-16 is also a big waste of time IMHO.

I would advise to use Glib::ustring from glibmm.
It contains some nice tools about general Unicode stuff too.

There is also ICU from IBM that you could check out.

这篇关于支持Unicode的可移植代码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

支持Unicode的可移植代码 [英] Portable Code that supports Unicode

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

支持Unicode的可移植代码 [英] Portable Code that supports Unicode

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭