从 K&R 书中解释 malloc 的这种实现 [英] Explain this implementation of malloc from the K&R book

查看:40
本文介绍了从 K&R 书中解释 malloc 的这种实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是Kernighan 和 Ritchie 关于 C 的书的摘录.它展示了如何实现一个版本的 malloc.虽然评论很好,但我很难理解它.有人可以解释一下吗?

This is an excerpt from the book on C by Kernighan and Ritchie. It shows how to implement a version of malloc. Although well commented, I am having great difficulty in understanding it. Can somebody please explain it?

typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;

static Header base; /* empty list to get started */
static Header *freep = NULL; /* start of free list */
/* malloc: general-purpose storage allocator */
void *malloc(unsigned nbytes)
{
   Header *p, *prevp;
   Header *morecore(unsigned);
   unsigned nunits;
   nunits = (nbytes+sizeof(Header)-1)/sizeof(header) + 1;
   if ((prevp = freep) == NULL) { /* no free list yet */
      base.s.ptr = freeptr = prevptr = &base;
      base.s.size = 0;
   }
   for (p = prevp->s.ptr; ; prevp = p, p = p->s.ptr) {
      if (p->s.size >= nunits) { /* big enough */
        if (p->s.size == nunits) /* exactly */
           prevp->s.ptr = p->s.ptr;
        else { /* allocate tail end */
           p->s.size -= nunits;
           p += p->s.size;
           p->s.size = nunits
             }
        freep = prevp;
        return (void *)(p+1);
      }
      if (p == freep) /* wrapped around free list */
         if ((p = morecore(nunits)) == NULL)
             return NULL; /* none left */
      }
}

#define NALLOC 1024 /* minimum #units to request */
/* morecore: ask system for more memory */

static Header *morecore(unsigned nu)
{

  char *cp, *sbrk(int);
  Header *up;

  if (nu < NALLOC)
    nu = NALLOC;

  cp = sbrk(nu * sizeof(Header));

  if (cp == (char *) -1) /* no space at all */
    return NULL;

  up = (Header *) cp;
  up->s.size = nu;
  free((void *)(up+1));

  return freep;
}

/* free: put block ap in free list */
void free(void *ap) {
  Header *bp, *p;
  bp = (Header *)ap - 1; /* point to block header */
  for (p = freep; !(bp > p && bp < p->s.ptr); p = p->s.ptr)
    if (p >= p->s.ptr && (bp > p || bp < p->s.ptr))
      break; /* freed block at start or end of arena */
  if (bp + bp->size == p->s.ptr) {
    bp->s.size += p->s.ptr->s.size;
    bp->s.ptr = p->s.ptr->s.ptr;
  } else
      bp->s.ptr = p->s.ptr;

  if (p + p->size == bp) {
    p->s.size += bp->s.size;
    p->s.ptr = bp->s.ptr;
  } else
    p->s.ptr = bp;
  freep = p;
}

推荐答案

好吧,我们这里有一大块写得很糟糕的代码.我将在这篇文章中做的最好的描述是软件考古.

Ok, what we have here is a chunk of really poorly written code. What I will do in this post could best be described as software archaeology.

第 1 步:修正格式.

缩进和紧凑格式对任何人都没有任何好处.需要插入各种空格和空行.注释可以用更易读的方式编写.我会先解决这个问题.

The indention and compact format doesn't do anyone any good. Various spaces and empty rows need to be inserted. The comments could be written in more readable ways. I'll start by fixing that.

同时我正在将支架样式从 K&R 样式更改-请注意 K&R 支架样式是可以接受的,这只是我的个人喜好.另一个个人偏好是在指向的类型旁边写 * 表示指针.我不会在这里争论(主观)风格问题.

At the same time I'm changing the brace style from K&R style - please note that the K&R brace style is acceptable, this is merely a personal preference of mine. Another personal preference is to write the * for pointers next to the type pointed at. I'll not argue about (subjective) style matters here.

另外,Header 的类型定义完全不可读,需要彻底修复.

Also, the type definition of Header is completely unreadable, it needs a drastic fix.

而且我发现了一些完全模糊的东西:他们似乎在函数内部声明了一个函数原型.Header* morecore(unsigned);.这是非常古老且非常糟糕的风格,我不确定 C 是否还允许它再使用.让我们删除那行,不管那个函数做什么,它都必须在别处定义.

And I spotted something completely obscure: they seem to have declared a function prototype inside the function. Header* morecore(unsigned);. This is very old and very poor style, and I'm not sure if C even allows it any longer. Lets just remove that line, whatever that function does, it will have to be defined elsewhere.

typedef long Align;                      /* for alignment to long boundary */

typedef union header                     /* block header */
{
  struct
  {
    union header *ptr;                   /* next block if on free list */
    unsigned size;                       /* size of this block */
  } s;

  Align x;                               /* force alignment of blocks */

} Header;


static Header base;                      /* empty list to get started */
static Header* freep = NULL;             /* start of free list */


/* malloc: general-purpose storage allocator */
void* malloc (unsigned nbytes)
{
  Header*   p;
  Header*   prevp;
  unsigned  nunits;

  nunits = (nbytes + sizeof(Header) - 1) / sizeof(header) + 1;

  if ((prevp = freep) == NULL)           /* no free list yet */
  {
    base.s.ptr = freeptr = prevptr = &base;
    base.s.size = 0;
  }

  for (p = prevp->s.ptr; ; prevp = p, p = p->s.ptr)
  {
    if (p->s.size >= nunits)             /* big enough */
    {
      if (p->s.size == nunits)           /* exactly */
        prevp->s.ptr = p->s.ptr;
      else                               /* allocate tail end */
      {
        p->s.size -= nunits;
        p += p->s.size;
        p->s.size = nunits
      }

      freep = prevp;
      return (void *)(p+1);
    }

    if (p == freep)                      /* wrapped around free list */
      if ((p = morecore(nunits)) == NULL)
        return NULL;                     /* none left */
  }
}

好的,现在我们可能真的能够阅读代码了.

Ok now we might actually be able to read the code.

第 2 步:淘汰广为人知的不良做法.

这段代码充满了当今被认为是不好的做法.它们需要被移除,因为它们会危及代码的安全性、可读性和维护性.如果您想参考宣扬与我相同实践的权威,请查看广泛认可的编码标准 MISRA-C.

This code is filled with things that are nowadays regarded as bad practice. They need to be removed, since they jeopardize the safety, readability and maintenance of the code. If you want a reference to an authority preaching the same practices as me, check out the widely-recognized coding standard MISRA-C.

我发现并删除了以下不良做法:

I have spotted and removed the following bad practices:

1) 仅在代码中键入 unsigned 可能会导致混淆:这是程序员的打字错误还是打算编写 unsigned int?我们应该用 unsigned int 替换所有的 unsigned.但是当我们这样做时,我们发现它在此上下文中用于给出各种二进制数据的大小.用于此类事务的正确类型是 C 标准类型 size_t.这本质上也只是一个 unsigned int,但它保证对于特定平台足够大".sizeof 运算符返回 size_t 类型的结果,如果我们查看 C 标准对真实 malloc 的定义,它是 void *malloc(size_t size);.所以 size_t 是最正确的类型.

1) Just typing unsigned in the code could lead to be confusion: was this a typo by the programmer or was the intention to write unsigned int? We should replace all unsigned with unsigned int. But as we do that, we find that it is used in this context to give the size of various binary data. The correct type to use for such matters is the C standard type size_t. This is essentially just an unsigned int as well, but it is guaranteed to be "large enough" for the particular platform. The sizeof operator returns a result of type size_t and if we look at the C standard's definition of the real malloc, it is void *malloc(size_t size);. So size_t is the most correct type to use.

2) 为我们自己的 malloc 函数使用与 stdlib.h 中相同的名称是一个坏主意.如果我们需要包含 stdlib.h,事情就会变得一团糟.根据经验,永远不要在您自己的代码中使用 C 标准库函数的标识符名称.我将名称更改为 kr_malloc.

2) It is a bad idea to use the same name for our own malloc function as the one residing in stdlib.h. Should we need to include stdlib.h, things will get messy. As a rule of thumb, never use identifier names of C standard library functions in your own code. I'll change the name to kr_malloc.

3) 代码滥用了所有静态变量都保证初始化为零的事实.C 标准对此进行了明确定义,但这是一个相当微妙的规则.让我们显式地初始化所有静态变量,以表明我们没有忘记意外地初始化它们.

3) The code is abusing the fact that all static variables are guaranteed to be initialized to zero. This is well-defined by the C standard, but a rather subtle rule. Lets initialize all statics explicitly, to show that we haven't forgotten to init them by accident.

4) 条件内部的赋值很危险且难以阅读.如果可能,应该避免这种情况,因为它也可能导致错误,例如经典的 = vs == 错误.

4) Assignment inside conditions is dangerous and hard to read. This should be avoided if possible, since it can also lead to bugs, such as the classic = vs == bug.

5) 由于求值顺序,同一行上的多个赋值很难阅读,而且可能很危险.

5) Multiple assignments on the same row is hard to read, and also possibly dangerous, because of the order of evaluation.

6) 同一行上的多个声明很难阅读,而且很危险,因为在混合数据和指针声明时可能会导致错误.始终在自己的一行中声明每个变量.

6) Multiple declarations on the same row is hard to read, and dangerous, since it could lead to bugs when mixing data and pointer declarations. Always declare each variable on a row of its own.

7) 在每个语句之后总是使用大括号.不这样做会导致错误错误错误.

7) Always uses braces after every statement. Not doing so will lead to bugs bugs bugs.

8) 切勿从特定指针类型强制转换为 void*.它在 C 中是不必要的,并且可以隐藏编译器本来会检测到的错误.

8) Never type cast from a specific pointer type to void*. It is unnecessary in C, and could hide away bugs that the compiler would otherwise have detected.

9) 避免在函数内使用多个 return 语句.有时它们会导致更清晰的代码,但在大多数情况下它们会导致意大利面.就代码而言,我们不能在不重写循环的情况下改变它,所以我稍后会解决这个问题.

9) Avoid using multiple return statements inside a function. Sometimes they lead to clearer code, but in most cases they lead to spaghetti. As the code stands, we can't change that without rewriting the loop though, so I will fix this later.

10) 保持 for 循环简单.它们应该包含一个 init 语句、一个循环条件和一个迭代,没有别的.这个带有逗号运算符和所有内容的 for 循环非常晦涩.再次,我们发现需要将此循环重写为理智的东西.我接下来会这样做,但现在我们有:

10) Keep for loops simple. They should contain one init statement, one loop condition and one iteration, nothing else. This for loop, with the comma operator and everything, is very obscure. Again, we spot a need to rewrite this loop into something sane. I'll do this next, but for now we have:

typedef long Align;                      /* for alignment to long boundary */

typedef union header                     /* block header */
{
  struct
  {
    union header *ptr;                   /* next block if on free list */
    size_t size;                         /* size of this block */
  } s;

  Align x;                               /* force alignment of blocks */

} Header;


static Header base = {0};                /* empty list to get started */
static Header* freep = NULL;             /* start of free list */


/* malloc: general-purpose storage allocator */
void* kr_malloc (size_t nbytes)
{
  Header*  p;
  Header*  prevp;
  size_t   nunits;

  nunits = (nbytes + sizeof(Header) - 1) / sizeof(header) + 1;

  prevp = freep;
  if (prevp == NULL)                     /* no free list yet */
  {
    base.s.ptr  = &base;
    freeptr     = &base;
    prevptr     = &base;
    base.s.size = 0;
  }

  for (p = prevp->s.ptr; ; prevp = p, p = p->s.ptr)
  {
    if (p->s.size >= nunits)             /* big enough */
    {
      if (p->s.size == nunits)           /* exactly */
      {
        prevp->s.ptr = p->s.ptr;
      }
      else                               /* allocate tail end */
      {
        p->s.size -= nunits;
        p += p->s.size;
        p->s.size = nunits
      }

      freep = prevp;
      return p+1;
    }

    if (p == freep)                      /* wrapped around free list */
    {
      p = morecore(nunits);
      if (p == NULL)
      {
        return NULL;                     /* none left */
      }
    }
  } /* for */
}

第 3 步:重写晦涩的循环.

由于前面提到的原因.我们可以看到这个循环永远持续下去,它通过从函数返回而终止,无论是在分配完成时,还是在没有剩余内存时.因此,让我们将其创建为循环条件,并将返回提升到函数的末尾.让我们摆脱那个丑陋的逗号运算符.

For the reasons mentioned earlier. We can see that this loop goes on forever, it terminates by returning from the function, either when the allocation is done, or when there is no memory left. So lets create that as a loop condition, and lift out the return to the end of the function where it should be. And lets get rid of that ugly comma operator.

我将引入两个新变量:一个结果变量用于保存结果指针,另一个用于跟踪循环是否应该继续.我将使用 bool 类型来让 K&R 大开眼界,它自 1999 年以来就成为 C 语言的一部分.

I'll introduce two new variables: one result variable to hold the resulting pointer, and another to keep track of whether the loop should continue or not. I'll blow K&R's minds by using the bool type, which is part of the C language since 1999.

(我希望我没有因为这个改变而改变算法,我相信我没有)

(I hope I haven't altered the algorithm with this change, I believe I haven't)

#include <stdbool.h>

typedef long Align;                      /* for alignment to long boundary */

typedef union header                     /* block header */
{
  struct
  {
    union header *ptr;                   /* next block if on free list */
    size_t size;                         /* size of this block */
  } s;

  Align x;                               /* force alignment of blocks */

} Header;


static Header base = {0};                /* empty list to get started */
static Header* freep = NULL;             /* start of free list */


/* malloc: general-purpose storage allocator */
void* kr_malloc (size_t nbytes)
{
  Header*  p;
  Header*  prevp;
  size_t   nunits;
  void*    result;
  bool     is_allocating;

  nunits = (nbytes + sizeof(Header) - 1) / sizeof(header) + 1;

  prevp = freep;
  if (prevp == NULL)                     /* no free list yet */
  {
    base.s.ptr  = &base;
    freeptr     = &base;
    prevptr     = &base;
    base.s.size = 0;
  }

  is_allocating = true;
  for (p = prevp->s.ptr; is_allocating; p = p->s.ptr)
  {
    if (p->s.size >= nunits)             /* big enough */
    {
      if (p->s.size == nunits)           /* exactly */
      {
        prevp->s.ptr = p->s.ptr;
      }
      else                               /* allocate tail end */
      {
        p->s.size -= nunits;
        p += p->s.size;
        p->s.size = nunits
      }

      freep = prevp;
      result = p+1;
      is_allocating = false;             /* we are done */
    }

    if (p == freep)                      /* wrapped around free list */
    {
      p = morecore(nunits);
      if (p == NULL)
      {
        result = NULL;                   /* none left */
        is_allocating = false;
      }
    }
    prevp = p;
  } /* for */

  return result;
}

第 4 步:编译这个垃圾.

由于这是来自 K&R,所以它充满了错别字.sizeof(header) 应该是 sizeof(Header).缺少分号.它们使用不同的名称 freep、prevp 与 freeptr、prevptr,但显然表示相同的变量.我相信后者实际上是更好的名字,所以让我们使用它们.

Since this is from K&R, it is filled with typos. sizeof(header) should be sizeof(Header). There are missing semi-colons. They use different names freep, prevp versus freeptr, prevptr, but clearly mean the same variable. I believe the latter were actually better names, so lets use those.

#include <stdbool.h>

typedef long Align;                      /* for alignment to long boundary */

typedef union header                     /* block header */
{
  struct
  {
    union header *ptr;                   /* next block if on free list */
    size_t size;                         /* size of this block */
  } s;

  Align x;                               /* force alignment of blocks */

} Header;


static Header base = {0};                /* empty list to get started */
static Header* freeptr = NULL;           /* start of free list */


/* malloc: general-purpose storage allocator */
void* kr_malloc (size_t nbytes)
{
  Header*  p;
  Header*  prevptr;
  size_t   nunits;
  void*    result;
  bool     is_allocating;

  nunits = (nbytes + sizeof(Header) - 1) / sizeof(Header) + 1;

  prevptr = freeptr;
  if (prevptr == NULL)                   /* no free list yet */
  {
    base.s.ptr  = &base;
    freeptr     = &base;
    prevptr     = &base;
    base.s.size = 0;
  }

  is_allocating = true;
  for (p = prevptr->s.ptr; is_allocating; p = p->s.ptr)
  {
    if (p->s.size >= nunits)             /* big enough */
    {
      if (p->s.size == nunits)           /* exactly */
      {
        prevptr->s.ptr = p->s.ptr;
      }
      else                               /* allocate tail end */
      {
        p->s.size -= nunits;
        p += p->s.size;
        p->s.size = nunits;
      }

      freeptr = prevptr;
      result = p+1;
      is_allocating = false;             /* we are done */
    }

    if (p == freeptr)                    /* wrapped around free list */
    {
      p = morecore(nunits);
      if (p == NULL)
      {
        result = NULL;                   /* none left */
        is_allocating = false;
      }
    }
    prevptr = p;
  } /* for */

  return result;
}

<小时>

现在我们有一些可读、可维护的代码,没有许多危险的做法,甚至可以编译!所以现在我们可以开始思考代码实际上在做什么.


And now we have somewhat readable, maintainable code, without numerous dangerous practices, that will even compile! So now we could actually start to ponder about what the code is actually doing.

结构Header",正如您可能已经猜到的,是链表中节点的声明.每个这样的节点都包含一个指向下一个节点的指针.我不太了解morecore函数,也不是环绕",我从来没有用过这个函数,也没有sbrk.但我假设它分配了这个结构中指定的头,以及该头后面的一些原始数据块.如果是这样,这就解释了为什么没有实际的数据指针:假设数据紧跟在内存中的标头之后.因此,对于每个节点,我们都获得了头部,并且在头部之后我们获得了一大块原始数据.

The struct "Header" is, as you might have guessed, the declaration of a node in a linked list. Each such node contains a pointer to the next one. I don't quite understand the morecore function, nor the "wrap-around", I have never used this function, nor sbrk. But I assume that it allocates a header as specified in this struct, and also some chunk of raw data following that header. If so, that explains why there is no actual data pointer: the data is assumed to follow the header, adjacently in memory. So for each node, we get the header, and we get a chunk of raw data following the header.

迭代本身非常简单,它们通过一个单链表,一次一个节点.

The iteration itself is pretty straight-forward, they are going through a single-linked list, one node at a time.

在循环结束时,他们将指针设置为指向块"末尾的指针,然后将其存储在一个静态变量中,以便程序将记住它先前分配内存的位置,下次执行该函数时被称为.

At the end of the loop, they set the pointer to point one past the end of the "chunk", then store that in a static variable, so that the program will remember where it previously allocated memory, next time the function is called.

他们正在使用一种技巧来使他们的标头以对齐的内存地址结束:他们将所有开销信息与一个足够大的变量一起存储在一个联合中,以符合平台的对齐要求.因此,如果ptr"的大小加上size"的大小太小而无法精确对齐,则联合保证至少分配 sizeof(Align) 个字节.我相信这整个技巧今天已经过时了,因为 C 标准要求自动结构/联合填充.

They are using a trick to make their header end up on an aligned memory address: they store all the overhead info in a union together with a variable large enough to correspond to the platform's alignment requirement. So if the size of "ptr" plus the size of "size" are too small to give the exact alignment, the union guarantees that at least sizeof(Align) bytes are allocated. I believe that this whole trick is obsolete today, since the C standard mandates automatic struct/union padding.

这篇关于从 K&R 书中解释 malloc 的这种实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆