为什么此代码中的缓冲区溢出行为与我期望的不同? [英] Why does the buffer overflow in this code behave different from what I expect?
问题描述
我有这个程序:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void main(void) {
char *buffer1 = malloc(sizeof(char));
char *buffer2 = malloc(sizeof(char));
strcpy(buffer2, "AA");
printf("before: buffer1 %s\n", buffer1);
printf("before: buffer2 %s\n", buffer2);
printf("address, buffer1 %p\n", &buffer1);
printf("address, buffer2 %p\n", &buffer2);
strcpy(buffer1, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB");
printf("after: buffer1 %s\n", buffer1);
printf("after: buffer2 %s\n", buffer2);
}
哪些印刷品:
before: buffer1
before: buffer2 AA
address, buffer1 0x7ffc700460d8
address, buffer2 0x7ffc700460d0
after: buffer1 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
after: buffer2 B
我希望这段代码能做什么:
-
由于char是8位长,我希望两个缓冲区的大小均为1字节/8位.
What I expect this code to do:
As a char is 8 bits long, i expect that both buffers have the size of 1 byte/8 bits.
一个ASCII字符长7位,我希望每个缓冲区中都可以容纳两个字符.
One ASCII char is 7 bits long, i expect that two characters fit into each buffer.
当我彼此紧接着分配一个字节的两个缓冲区时,我希望它们在内存中紧挨着彼此.因此,我希望每个地址之间的差是1(因为内存是按字节寻址的?),而不是我的小程序打印出来的8.
As I allocate two buffers of one byte directly after each other, i expect that they are directly next to each other in the memory. Therefore, i expect that the difference between each address is 1 (since the memory is addressed by byte?), and not 8 as my little program has printed.
因为它们在内存中彼此直接相邻,所以我希望当我将
strcpy(buffer1, BBBB);
写入第一个BB
时,缓冲区2会因BB
溢出.溢出到buffer2
.因此,我希望strcpy(buffer1, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB");
会产生:As they are directly next to each other in the memory, i expect buffer 2 to be overflown with
BB
when I dostrcpy(buffer1, BBBB);
as the firstBB
are written tobuffer1
and the rest overflows tobuffer2
. Therefore, i'd expect thatstrcpy(buffer1, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB");
produces:-
buffer2
中的缓冲区溢出,因此它的值约为BBBBBBBBBBBBBBBBBBBBBBBBBBBBB
.
an buffer overflow in
buffer2
, so that it has the valueBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
or so.
- 我如何计算的:
B
的amonut已被strcpy'd-两个缓冲区的4个B
.
- How I calculated that: the amonut of
B
which have been strcpy'd - 4B
's for both buffers.
分段错误.我只分配了2个字节(因为
buffer1
和buffer2
的大小在一起是2个字节).由于BBBBBBBBBBBBBBBBBBBBBBBBB
既不适合buffer1
也不适合buffer2
(因为两者都已被填充),因此它将溢出至buffer2
之后的下一个内存缓冲区.而且由于我还没有分配,我会遇到分段错误.an segmentation fault. I have only allocated 2 bytes (since the size of
buffer1
andbuffer2
are together 2 bytes). SinceBBBBBBBBBBBBBBBBBBBBBBBBB
doesn't fit into neitherbuffer1
norbuffer2
(because both are already filled), that would be overflown to the next memory buffer afterbuffer2
. And because i have not allocated that, i'd expect an segmentation fault.因此,我想问:为什么我的程序的行为与我的期望有所不同?我在哪里误会了东西?
我有一个x86_64体系结构,上面的程序是用
gcc version 6.3.1 20170306 (GCC)
- 我知道
strcpy
没有绑定检查,因此是有意使用的.我想调查缓冲区溢出等等.因此,请不要写答案/评论说我应该使用与strcpy
不同的方法.
- I know that
strcpy
is not bound checking and the usage is intentional. I want to investiage buffer overflows and such. Therefore, please don't write an answer/comment saying that i should use a different method asstrcpy
.
推荐答案
- 因为一个char是8位长,...
- As a char is 8 bits long, ...
对于指定的体系结构和操作系统,这是正确的. (C标准允许
char
的长度 大于8位,但这在当今非常罕见;我所知道的唯一示例是This is correct for the stated architecture and operating system. (The C standard allows
char
to be more than 8 bits long, but this is very rare nowadays; the only example I know of is the TMS320 family of DSPs, wherechar
may be 16 bits. It's not allowed to be smaller.)请注意,
sizeof(char) == 1
根据定义,因此通常在代码中写入sizeof(char)
或foo * sizeof(char)
被认为是不好的样式.Note that
sizeof(char) == 1
by definition and therefore it is generally considered bad style to writesizeof(char)
orfoo * sizeof(char)
in your code....我希望两个缓冲区的大小均为1字节/8位.
... i expect that both buffers have the size of 1 byte/8 bits.
这也是正确的(但请参见下文).
This is also correct (but see below).
- 一个ASCII字符长7位,我希望每个缓冲区中都可以容纳两个字符.
- One ASCII char is 7 bits long, i expect that two characters fit into each buffer.
这是不正确的,有两个原因.首先,没有人再使用7位ASCII.实际上每个字符长 8 位.其次,两个7位字符不装入一个8位缓冲区.我看到问题的注释在这一点上有些混乱,所以让我尝试进一步解释:七个位可以表示2 7 个不同的值,足够容纳定义的128个不同字符按照原始ASCII标准.两个七个7位字符一起可以具有128 * 128 = 16384 = 2 14 个不同的值;需要14位来表示,而不能容纳8位.您似乎以为它只有2 * 128 = 2 8 ,可以容纳8位,但这是不对的.这意味着一旦您看到第一个字符,第二个字符只有两个可能性,而不是128个.
This is not correct, for two reasons. First, nobody uses 7-bit ASCII anymore. Each character is in fact eight bits long. Second, two seven-bit characters do not fit into one eight-bit buffer. I see that there is some confusion on this point in the comments on the question, so let me attempt to explain further: Seven bits can represent 27 different values, just enough room for the 128 different characters defined by the original ASCII standard. Two seven-bit characters, together, can have 128 * 128 = 16384 = 214 different values; that requires 14 bits to represent, and will not fit into eight bits. You seem to have thought it was only 2 * 128 = 28, which would fit into eight bits, but that's not right; it would mean that once you saw the first character, there were only two possibilities for the second character, not 128.
- 当我彼此紧挨着分配一个字节的两个缓冲区时,我希望它们在内存中紧挨着彼此.因此,我希望每个地址之间的差是1(因为内存是按字节寻址的?),而不是我的小程序打印出来的8.
- As I allocate two buffers of one byte directly after each other, i expect that they are directly next to each other in the memory. Therefore, i expect that the difference between each address is 1 (since the memory is addressed by byte?), and not 8 as my little program has printed.
正如您自己观察到的那样,您的期望是不正确的.
As you have observed for yourself, your expectations are incorrect.
malloc
不需要将连续的分配彼此相邻;实际上,这些分配是否彼此相邻"可能不是一个有意义的问题. C标准竭尽全力避免,要求在两个不指向同一数组的指针之间进行任何有意义的比较.malloc
is not required to put consecutive allocations next to each other; in fact, "are these allocations next to each other" may not be a meaningful question. The C standard goes out of its way to avoid requiring there to be any meaningful comparison between two pointers that don't point into the same array.现在,您正在使用具有平坦地址空间"的系统,因此 有意义的是比较连续分配中的指针(前提是您必须自己动手做,而不要使用代码)逻辑上可以解释分配之间的差距,但是首先我必须指出您打印了错误的地址:
Now, you are working on a system with a "flat address space", so it is meaningful to compare pointers from successive allocations (provided you do it in your own brain, not with code) and there is a logical explanation for the gap between the allocations, but first I have to point out that you printed the wrong addresses:
printf("address, buffer1 %p\n", &buffer1); printf("address, buffer2 %p\n", &buffer2);
这将打印指针变量的地址,而不是缓冲区的地址.你应该写
This prints the addresses of the pointer variables, not the addresses of the buffers. You should have written
printf("address, buffer1 %p\n", (void *)buffer1); printf("address, buffer2 %p\n", (void *)buffer2);
(强制转换为
void *
是因为printf
需要一个可变的参数列表.)如果您写过,将会看到与(The cast to
void *
is required becauseprintf
takes a variable argument list.) If you had written that you would have seen output similar toaddress, buffer1 0x55583d9bb010 address, buffer2 0x55583d9bb030
要注意的重要一点是,这些分配相差 16 个字节,不仅如此,它们都可以被16整除.
and the important thing to notice is that these allocations differ by sixteen bytes, and not only that, they're both evenly divisible by 16.
malloc
来产生 any 类型所需的 aligned 缓冲区,即使您不能将该类型的值放入分配中也是如此.如果一个地址可以被该数字平均整除,则将其对齐到一定数量的字节.在您的系统上,最大对齐要求是16;最大对齐要求是16.您可以通过运行该程序来确认...malloc
is required to produce buffers that are aligned as required for any type, even if you can't fit a value of that type into the allocation. An address is aligned to some number of bytes if it's evenly divisible by that number. On your system, the maximum alignment requirement is 16; you can confirm this by running this program...#include <stdalign.h> #include <stddef.h> #include <stdio.h> int main(void) { printf("%zu\n", alignof(max_align_t)); return 0; }
因此,这意味着
malloc
返回的所有地址都必须被16整除.因此,当您向malloc
请求两个1字节缓冲区时,它们之间必须留有15字节的间隔.这不是 意思是malloc
将尺寸四舍五入. C标准明确禁止您访问间隙中的字节. (我不知道有任何现代的商用CPU可以执行该禁止,但是调试工具如valgrind
会,并且已经有实验性的CPU设计可以做到这一点.而且,通常malloc
块之前或之后的空间包含malloc
实现内部使用的数据,您不得篡改.)So that means all addresses returned by
malloc
must be evenly divisible by 16. Therefore, when you askmalloc
for two one-byte buffers, it has to leave a fifteen-byte gap between them. This does not mean thatmalloc
rounded the size up; the C standard specifically forbids you to access the bytes in the gap. (I'm not aware of any modern, commercial CPUs that can enforce that prohibition, but debugging tools likevalgrind
will, and there have been experimental CPU designs that can do it. Also, often the space immediately before or after amalloc
block contains data used internally by themalloc
implementation, which you must not tamper with.)第二次分配后存在类似的差距.
There's a similar gap after the second allocation.
- 因为它们在内存中彼此直接相邻,所以我希望在我将
strcpy(buffer1, BBBB);
写入时将缓冲区2溢出至BB
,因为第一个BB
被写入buffer1
,其余的溢出至buffer2
.
- As they are directly next to each other in the memory, i expect buffer 2 to be overflown with
BB
when I dostrcpy(buffer1, BBBB);
as the firstBB
are written tobuffer1
and the rest overflows tobuffer2
.
如前所述,它们在内存中并不直接相邻,并且每个B占用八个位.一个B写入您的第一个分配,下一个15写入两个分配之间的间隙,第16个写入第二个分配,之后的15个写入第二个分配的之后间隙,最后一个B和一个NUL到外面的空间.
As previously discussed, they are not directly next to each other in memory, and each B takes up eight bits. One B is written to your first allocation, the next 15 to the gap between the two allocations, the 16th to the second allocation, 15 more after that to the gap after the second allocation, and the final one B and one NUL to the space beyond.
我只分配了2个字节(因为
buffer1
和buffer2
的大小合起来是2个字节).由于BBBBBBBBBBBBBBBBBBBBBBBBB
既不适合buffer1
也不适合buffer2
(因为两者都已被填充),因此它将溢出至buffer2
之后的下一个内存缓冲区.而且由于我还没有分配,我会期望出现分段错误.I have only allocated 2 bytes (since the size of
buffer1
andbuffer2
are together 2 bytes). SinceBBBBBBBBBBBBBBBBBBBBBBBBB
doesn't fit into neitherbuffer1
norbuffer2
(because both are already filled), that would be overflown to the next memory buffer afterbuffer2
. And because i have not allocated that, i'd expect an segmentation fault.我们已经讨论了为什么您的计算不正确,但是您 did 在第二次分配之后一直写到间隙结束之后,一直写到超越空间"中,所以为什么不进行段错误?这是因为,在操作系统原语级别,内存以称为"页",大于您要求的内存量.如果溢出超出页面边界,CPU只能检测到缓冲区溢出并触发分段错误.您只是走得不够远.我在计算机上对您的程序进行了实验,这非常相似,我需要写 132 KB (1 KB为1024字节)(有人说这应该被称为kibibyte;它们是错误)超出buffer1的末尾以获取段错误.我计算机上的每个页面只有4 KB,但是
malloc
要求操作系统以更大的块来提供内存,因为系统调用非常昂贵.We've already discussed why your calculations were incorrect, but you did write all the way past the end of the gap after the second allocation and into the "space beyond", so why no segfault? This is because, at the level of operating system primitives, memory is allocated to applications in units called "pages", which are larger than the amount of memory you asked for. The CPU can only detect a buffer overrun and trigger a segmentation fault if the overrun crosses a page boundary. You just didn't go far enough. I experimented with your program on my computer, which is very similar, and I need to write 132 kilobytes (a kilobyte is 1024 bytes) (some people say that that's supposed to be called a kibibyte; they are wrong) beyond the end of buffer1 to get a segfault. Pages on my computer are only 4 kilobytes each, but
malloc
asks the OS for memory in even larger chunks because system calls are expensive.没有得到及时的段错误不是表示您很安全;您极有可能破坏
malloc
的内部数据,或破坏超越空间"中某处的其他分配.如果我使用您的原始程序并在最后添加对free(buffer1)
的调用,它将在其中崩溃.Not getting a prompt segfault does not mean you are safe; there is an excellent chance you clobbered
malloc
's internal data, or another allocation somewhere in the "space beyond". If I take your original program and add a call tofree(buffer1)
at the end, it crashes in there.这篇关于为什么此代码中的缓冲区溢出行为与我期望的不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-