我想要unsigned char * string literals [英] I want unsigned char * string literals

查看:52
本文介绍了我想要unsigned char * string literals的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,


早在我决定我的代码中的所有文字(大多数人称之为字符串[1])

无符号的字符 *。原因是这些数组的元素

显然没有签名。事实上,他们甚至可能没有b $ b代表完整的角色。在这一点上我认为文字简单

二进制blob。他们使用什么字符集,字符编码和终止

不应该在用于操作它们的界面中公开。


但现在我陷入了两难境地。 C字符串文字是带符号的char *。随着GCC

4关于每个标志不匹配的警告,我的代码在这个地方喷出了所有

的警告,我正在试图找出该怎么做。


我目前的想法是定义Windows样式的_T宏:


#define _T(s)((unsigned char *)s)


使用" text"功能如:


int

text_copy(const unsigned char * src,unsigned char * dst,int n)

{

while(n--&& * src){

* dst ++ = * src ++;

...


废除传统字符串函数的使用(至少对于text)。


代码可能如下所示:


unsigned char buf [255];

text_copy(_T(" hello,world"),buf,sizeof(buf));

您怎么看?


如果我做了以上操作,如果有人有更好的想法,我有很多工作要做我真的很想听听。


迈克


PS:如果你有不满意的意见(但专业)让我们听听它。
听到它。


[1]我使用术语文本表示实际可能向用户显示的内容(可能是在国外)。我使用术语字符串

来表示传统的8位零终止字符*数组。

解决方案



" Michael B Allen" < io **** @ gmail.com写信息

新闻:20 ************************** ** @ gmail.com ...


您好,


早期我决定所有文字(大多数人都是调用字符串[1])
我的代码中的
将是unsigned char *。原因是这些数组的

元素肯定没有签名。事实上,他们可能不会代表完整的角色。此时我认为文本为

简单的二进制blob。他们使用什么字符集,字符编码和终止

不应暴露在用于操作

的界面中。



char *表示人类可读字符列表。

unsigned char *表示任意字节列表 - 几乎总是八位字节。

signed char * - 非常罕见。有时您可能需要一个小整数。我将

拒绝提及我的64位整数广告。


unsigned char真的应该是byte。不幸的是,一个糟糕的决定是以同样的方式处理字符和字节,现在我们坚持使用

sizeof(char)== 1个字节。


如果你开始使用unsigned char *作为字符串,那么,正如你所发现的那样,你
会快速中断对字符串库函数的所有调用。这可以由演员修补,但真正的答案是不要在第一个

的地方做到这一点。

很少是你对角色的实际编码感兴趣。当您想要编写查找表以获得速度时,会出现一些

异常,或者写下从b *转换为机器字母的低级例程,或者放置文本
按照约定的编码进入二进制文件,但它们很少。


-

免费游戏和编程好东西。
http://www.personal.leeds.ac.uk/~bgy1mm


Michael B Allen< io **** @ gmail.comwrites:


早期我决定在我的代码中所有文本(大多数人称之为字符串[1])

将是unsigned char *。原因是这些数组的元素

显然没有签名。事实上,他们甚至可能没有b $ b代表完整的角色。在这一点上我认为文字简单

二进制blob。他们使用什么字符集,字符编码和终止

不应该在用于操作它们的界面中公开。


但现在我陷入了两难境地。 C字符串文字是带符号的char *。随着GCC

4关于每个标志不匹配的警告,我的代码在这个地方喷出警告所有

,我正在试图弄清楚如何应对它。



[...]


否,C字符串文字的类型''数组[N]为char'' ;在大多数情况下,但不是

all,contexts,这是隐含的转换为''char *。 (考虑

''sizeof"你好,世界'''。)


我的主要观点不是他们是阵列而不是指针,但是那个

他们是(普通)char的数组,而不是signed char。普通字符是

相当于* * signed * signed char或unsigned char,但它们仍然是

不同类型。看来普通字符在你的实现中签了




我知道这不能回答你的实际问题;希望有人

否则可以提供帮助。


-

Keith Thompson(The_Other_Keith) ks *** @ mib.org < http://www.ghoti.net/~kst>

圣地亚哥超级计算机中心< ; *< http://users.sdsc.edu/~kst>

我们必须做点什么。这是事情。因此,我们必须这样做。

- Antony Jay和Jonathan Lynn,是部长


Michael B Allen写道:


>

您好,


早期我决定所有文字(最重要的是人们称之为字符串[1])
我的代码中的
将是unsigned char *。

理由是元素

of这些数组肯定没有签名。事实上,他们甚至可能没有b $ b代表完整的角色。在这一点上我认为文字简单

二进制blob。什么charset,

他们使用的字符编码和终止

不应该在用于操作它们的界面中暴露。


但现在我陷入两难境地。 C字符串文字是带符号的char *。



它们是普通字符数组,

,可以是有符号或无符号类型。


GCC

4警告每个标志不匹配,我的代码喷出警告所有

在这个地方我想弄清楚怎么办呢。


我目前的想法是定义一个Windows风格的_T宏:


#define _T(s)(( unsigned char *)s)


使用" text"功能如:


int

text_copy(const unsigned char * src,unsigned char * dst,int n)

{

while(n--&& * src){

* dst ++ = * src ++;

...


废除传统字符串函数的使用

(至少对于text)。


代码可能会看起来如下所示:


unsigned char buf [255];

text_copy(_T(" hello,world"),buf,sizeof(buf)) ;


你怎么看?


如果我做了以上的工作我还有很多工作要做

所以如果有人有更好的主意

我真的很想听听它。


迈克


PS:如果你有一个不利的意见

(但专业)让我们听听。



解决方案很明显:使用char数组来包含字符串。


使用unsigned char数组来保存字符串

为你创造了一个问题,但什么也没解决。


如果我有问题

是由使用char数组引起的拿着字符串,

我不知道问题是什么。


-

pete


Hello,

Early on I decided that all text (what most people call "strings" [1])
in my code would be unsigned char *. The reasoning is that the elements
of these arrays are decidedly not signed. In fact, they may not even
represent complete characters. At this point I think of text as simple
binary blobs. What charset, character encoding and termination they use
should not be exposed in the interface used to operate on them.

But now I have a dilemma. C string literals are signed char *. With GCC
4 warning about every sign mismatch, my code is spewing warnings all
over the place and I''m trying to figure out what to do about it.

My current thought is to define a Windows style _T macro:

#define _T(s) ((unsigned char *)s)

Use "text" functions like:

int
text_copy(const unsigned char *src, unsigned char *dst, int n)
{
while (n-- && *src) {
*dst++ = *src++;
...

And abolish the use of traditional string functions (at least for "text").

The code might then look like the following:

unsigned char buf[255];
text_copy(_T("hello, world"), buf, sizeof(buf));

What do you think?

If I do the above I have a lot of work to do so if someone has a better
idea I''d really like to hear about it.

Mike

PS: If you have an opinion that is unfavorable (but professional) let''s
hear it.

[1] I use the term "text" to mean stuff that may actually be displayed
to a user (possibly in a foreign country). I use the term "string"
to represent traditional 8 bit zero terminated char * arrays.

解决方案


"Michael B Allen" <io****@gmail.comwrote in message
news:20****************************@gmail.com...

Hello,

Early on I decided that all text (what most people call "strings" [1])
in my code would be unsigned char *. The reasoning is that the
elements of these arrays are decidedly not signed. In fact, they may not
even represent complete characters. At this point I think of text as
simple binary blobs. What charset, character encoding and termination
they use should not be exposed in the interface used to operate on
them.

char * for a list of human readable characters.
unsigned char *for a list of arbitrary bytes - almost always octets.
signed char * - very rare. Sometimes you might need a tiny integer. I will
resist mentioning my campaign for 64 bit ints.

unsigned char really ought to be "byte". Unfortunately a bad decison was
taken to treat characters and bytes the same way, and now we are stuck with
sizeof(char) == 1 byte.

If you start using unsigned char* for strings then, as you have found, you
will merrily break all the calls to string library functions. This can be
patched up by a cast, but the real answer is not to do that in the first
place.
Very rarely are you interested in the actual encoding of a character. A few
exceptions arise when you want to code lookup tables for speed, or write
low-level routines to convert from decimal to machine letter, or put text
into binary files in an agreed coding, but they are very few.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm


Michael B Allen <io****@gmail.comwrites:

Early on I decided that all text (what most people call "strings" [1])
in my code would be unsigned char *. The reasoning is that the elements
of these arrays are decidedly not signed. In fact, they may not even
represent complete characters. At this point I think of text as simple
binary blobs. What charset, character encoding and termination they use
should not be exposed in the interface used to operate on them.

But now I have a dilemma. C string literals are signed char *. With GCC
4 warning about every sign mismatch, my code is spewing warnings all
over the place and I''m trying to figure out what to do about it.

[...]

No, C string literals have type ''array[N] of char''; in most, but not
all, contexts, this is implicity converted to ''char*. (Consider
''sizeof "hello, world"''.)

My main point isn''t that they''re arrays rather than pointers, but that
they''re arrays of (plain) char, not of signed char. Plain char is
equivalent to *either* signed char or unsigned char, but is still a
distinct type from either of them. It appears that plain char is
signed in your implementation.

I know this doesn''t answer your actual question; hopefully someone
else can help with that.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"


Michael B Allen wrote:

>
Hello,

Early on I decided that all text (what most people call "strings" [1])
in my code would be unsigned char *.
The reasoning is that the elements
of these arrays are decidedly not signed. In fact, they may not even
represent complete characters. At this point I think of text as simple
binary blobs. What charset,
character encoding and termination they use
should not be exposed in the interface used to operate on them.

But now I have a dilemma. C string literals are signed char *.

They are arrays of plain char,
which may be either a signed or unsigned type.

With GCC
4 warning about every sign mismatch, my code is spewing warnings all
over the place and I''m trying to figure out what to do about it.

My current thought is to define a Windows style _T macro:

#define _T(s) ((unsigned char *)s)

Use "text" functions like:

int
text_copy(const unsigned char *src, unsigned char *dst, int n)
{
while (n-- && *src) {
*dst++ = *src++;
...

And abolish the use of traditional string functions
(at least for "text").

The code might then look like the following:

unsigned char buf[255];
text_copy(_T("hello, world"), buf, sizeof(buf));

What do you think?

If I do the above I have a lot of work to do
so if someone has a better idea
I''d really like to hear about it.

Mike

PS: If you have an opinion that is unfavorable
(but professional) let''s hear it.

The solution is obvious: use arrays of char to contain strings.

Using arrays of unsigned char to hold strings
creates a problem for you, but solves nothing.

If I have a problem
that is caused by using arrays of char to hold strings,
I''m unaware of what the problem is.

--
pete


这篇关于我想要unsigned char * string literals的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆