C字符串混乱 [英] C strings confusion

查看:129
本文介绍了C字符串混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我学习C,现在和有一点混淆的字符数组 - 串

 字符名称[15] =FORTRAN;

这个没问题 - 它的一个数组,可容纳(?最多)15个字符

 字符名称[] =FORTRAN;

ç计数为我的字符数,所以我没有! - 整齐

 字符*名称;

好吧。现在怎么办?我所知道的是,这可以认为是以后被分配字符(例如:通过用户输入)的一个大数目,但


  • 为什么他们称这是一个字符指针?我知道作为指针变量的引用

  • 这是一个借口?这是否发现任何其他用途比的char *?

  • 这是什么实际?它是一个指针?你如何正确地使用它?

在此先感谢,
喇嘛


解决方案

我觉得这可以这样来解释,因为一张图片胜过千言万语......

我们会刚开始时字符名称[] =FORTRAN,这是字符数组的长度是在编译时已知,7确切, 对?错误!这是8,因为'\\ 0'是一个NUL结束符,所有字符串必须有。


CHAR名称[] =FORTRAN;
+ ====== + - + - + - + - + - + - + - + - +
|×1234 | | F |Ø| R | T | r |一样| N | \\ 0 |
+ ====== + - + - + - + - + - + - + - + - +

在链接时,编译器和链接给符号名称的一个0x1234的内存地址。
使用下标操作符,即名称[1] 例如,编译器知道如何计算在内存中是抵消的性格,为0x1234 + 1 = 0x1235,它是的确是O。这是很简单的,而且,与ANSI C标准,一个字符数据类型的大小为1字节,它可以解释运行时如何获得这个语义的价值名称[CNT ++] ,假设 CNT INT 埃格尔并拥有3的值。例如,运行了一个自动几步之遥,从零算起,偏移的值是T。这很简单到目前为止好。

如果名称[12] 被执行了,会发生什么?那么,code要么崩溃,否则你会得到垃圾,因为数组的边界从指数/偏移0(0×1234)多达8个(0x123B)。以后凡是不属于名称变量,那被称为一个缓冲区溢出!

名称在内存中的地址为0x1234,而在这个例子中,如果你做到这一点:


的printf(名称的地址%P \\ N,与名);输出将是:
名称的地址为0x00001234

有关简洁和与实例保持的缘故,存储器的地址是32位,因此,可以看到额外的0。很公平?对了,让我们继续前进。

现在上的指针...
的char *名称是一个指向类型字符的....

编辑:
我们初始化为NULL如图所示的感谢丹的指出小错误...


的char *名称=(的char *)NULL;
+ ====== + + + ======
| 0x5678的| - > | 0×0000 | - > NULL
+ ====== + + + ======

在编译/链接时,在名称不指向任何东西,但有符号编译/链接时,地址名称(0x5678的),实际上它是 NULL 的指针地址名称是未知的,因此为0x0000

现在,请记住 这是至关重要的,符号的地址在编译/链接时是已知的,但指针地址不详,与任何类型的指针打交道时

假设我们做到这一点:


名称=(字符*)malloc的((20 * sizeof的(字符))+ 1);
的strcpy(名称中,Fortran);

我们叫的malloc 来分配内存块为20个字节,不,它不是21,我就加1的大小的原因是为'\\ 0 NUL终止字符。假设在运行时,给出的地址是0x9876,


字符*名称;
+ ====== + ====== + - + - + - + - + - + - + - + - +
| 0x5678的| - > | 0x9876 | - > | F |Ø| R | T | r |一样| N | \\ 0 |
+ ====== + ====== + - + - + - + - + - + - + - + - +

所以,当你做到这一点:


的printf(名称的地址%P \\ N,名);
的printf(名称的地址%P \\ N,与名);输出将是:
名称的地址为0x00005678
名称的地址为0x00009876

现在,这是哪里的错觉'的数组和指针是一样的进场位置

当我们做到这一点:


焦炭CH =名称[1];

什么发生在运行时是这样的:


  1. 符号的地址名称正在抬头

  2. 获取该符号的内存地址,即0x5678的。

  3. 在该地址,包含另一个地址,一个指针,指向内存中把它拿来,即0x9876

  4. 获取偏移基于1的下标值,并将其添加到指针地址,即0x9877取回的内存地址,即'O'的值,并分配给 CH

<青霉>那上面是理解这种区别是至关重要的,数组和指针之间的差的运行时如何获取数据,在指针上,有取的一个额外的间接的

记住,数组的类型T 永远衰变成 类型T 。

当我们做到这一点:


焦炭CH = *(姓名+ 5);


  1. 符号的地址名称正在抬头

  2. 获取该符号的内存地址,即0x5678的。

  3. 在该地址,包含另一个地址,一个指针,指向内存中把它拿来,即0x9876

  4. 获取偏移量基础上,5的值,并将其添加到指针地址,即0x987A检索在内存地址,即R,并分配给 CH

顺便说一下,你也可以做到这一点,以字符数组也...

更进一步,通过下标运算符在数组的情况下,即字符名称[] =...; 名称[ subscript_value] 真的是一样的*(姓名+ subscript_value)。
即。


命名[3]相同*(名+ 3)

此外,由于前pression *(姓名+ subscript_value)可交换,这是反向,


*(subscript_value +名称)的相同*(名+ subscript_value)

因此​​,这就解释了为什么在其中一个答案上面你可以写像这样(的尽管它,不推荐的做法,即使它是非常合理!的)


3 [名]

好吧,我怎么得到指针的值?
这正是 * 的用途,
假设指针名称有0x9878该指针的内存地址,再次,指的是上面的例子,这是它是如何实现的:


焦炭CH = *名称;

这意味着,获得被指向的0x9878,现在 CH 内存地址上有'R'的值的值。这被称为间接引用。我们只是取消引用一个名称指针,以获取价值并将其分配给 CH

此外,编译器知道一个的sizeof(char)的是1,因此你可以这样做指针递增/递减操作


*名字++;
*名称 - ;

指针由一个自动执行步骤向上/向下结果。

当我们做到这一点,假设0x9878指针存储器地址:


焦炭CH = *名称++;

什么是*名称的值和地址是什么,答案是,在 *名称现在将包含T并将其分配给 CH ,指针内存地址为0x9879。

这,你必须要小心也以同样的原则和精神,以什么早些时候关于内存边界的第一部分指出(见'如果name [12]被执行,会发生什么'以上)的结果将是相同的,即code崩溃和烧伤!

现在,如果我们释放的内存块由的名字指出,会发生什么通过调用C函数免费名称作为参数,即免费(名)


+ ====== + + + ======
| 0x5678的| - > | 0×0000 | - > NULL
+ ====== + + + ======

是的,内存块被释放,并交还给用于运行时环境由的malloc 的另一个即将到来的code执行。

现在,这就是分段故障的通用符号进场,因为名称不指向任何东西,会发生什么,当我们取消对它的引用即


焦炭CH = *名称;

是的,code将崩溃,以分段故障,这是Unix / Linux下常用的燃烧。在Windows下,对话框将沿着'不可恢复的错误或的线条出现已与应用程序发生错误,你是否希望该报告发送给Microsoft?......如果指针一直没有的malloc D以及任何试图取消对它的引用,是保证和好如初。

另外:记住这一点,每的malloc 有相应的免费,如果没有相应的免费,你必须在内存分配但未被释放内存泄漏。

和你有它,那是多么的指针工作,以及如何数组的指针不同,如果你正在阅读一本教科书,说他们是一样的,撕下该网页,撕了! :)

我希望这是帮助您了解指针。

I'm learning C right now and got a bit confused with character arrays - strings.

char name[15]="Fortran";

No problem with this - its an array that can hold (up to?) 15 chars

char name[]="Fortran";

C counts the number of characters for me so I don't have to - neat!

char* name;

Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but

  • Why do they call this a char pointer? I know of pointers as references to variables
  • Is this an "excuse"? Does this find any other use than in char*?
  • What is this actually? Is it a pointer? How do you use it correctly?

thanks in advance, lamas

解决方案

I think this can be explained this way, since a picture is worth a thousand words...

We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.

char name[] = "Fortran";
+======+     +-+-+-+-+-+-+-+--+
|0x1234|     |F|o|r|t|r|a|n|\0|
+======+     +-+-+-+-+-+-+-+--+ 

At link time, the compiler and linker gave the symbol name a memory address of 0x1234. Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.

What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!

The address of name in memory is 0x1234, as in the example, if you were to do this:

printf("The address of name is %p\n", &name);

Output would be:
The address of name is 0x00001234

For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.

Now on to pointers... char *name is a pointer to type of char....

Edit: And we initialize it to NULL as shown Thanks Dan for pointing out the little error...

char *name = (char*)NULL;
+======+     +======+ 
|0x5678| ->  |0x0000|    ->    NULL
+======+     +======+ 

At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.

Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type

Suppose we do this:

name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");

We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,

char *name;
+======+     +======+          +-+-+-+-+-+-+-+--+
|0x5678| ->  |0x9876|    ->    |F|o|r|t|r|a|n|\0|
+======+     +======+          +-+-+-+-+-+-+-+--+

So when you do this:

printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);

Output would be:
The address of name is 0x00005678
The address of name is 0x00009876

Now, this is where the illusion that 'arrays and pointers are the same comes into play here'

When we do this:

char ch = name[1];

What happens at runtime is this:

  1. The address of symbol name is looked up
  2. Fetch the memory address of that symbol, i.e. 0x5678.
  3. At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
  4. Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.

That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.

Remember, an array of type T will always decay into a pointer of the first element of type T.

When we do this:

char ch = *(name + 5);

  1. The address of symbol name is looked up
  2. Fetch the memory address of that symbol, i.e. 0x5678.
  3. At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
  4. Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.

Incidentally, you can also do that to the array of chars also...

Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value). i.e.

name[3] is the same as *(name + 3)

And since the expression *(name + subscript_value) is commutative, that is in the reverse,

*(subscript_value + name) is the same as *(name + subscript_value)

Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)

3[name]

Ok, how do I get the value of the pointer? That is what the * is used for, Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:

char ch = *name;

This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.

Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this

*name++;
*name--;

The pointer automatically steps up/down as a result by one.

When we do this, assuming the pointer memory address of 0x9878:

char ch = *name++;

What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.

This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!

Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):

+======+     +======+ 
|0x5678| ->  |0x0000|    ->    NULL
+======+     +======+ 

Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.

Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.

char ch = *name;

Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.

Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.

And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)

I hope this is of help to you in understanding pointers.

这篇关于C字符串混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆