如何仅使用标准库分配对齐的内存? [英] How to allocate aligned memory only using the standard library?

查看:20
本文介绍了如何仅使用标准库分配对齐的内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为求职面试的一部分,我刚刚完成了一项测试,有一个问题让我很难过,甚至使用 Google 作为参考.我想看看 StackOverflow 的工作人员可以用它做什么:

<块引用>

memset_16aligned 函数需要传递给它一个 16 字节对齐的指针,否则会崩溃.

a) 你将如何分配 1024 字节的内存,并将其与 16 字节的边界对齐?
b) memset_16aligned 执行后释放内存.

<代码>{无效*内存;无效 *ptr;//在这里回答 a)memset_16aligned(ptr, 0, 1024);//在这里回答 b)}

解决方案

原答案

<代码>{void *mem = malloc(1024+16);void *ptr = ((char *)mem+16) &〜0x0F;memset_16aligned(ptr, 0, 1024);免费(内存);}

固定答案

<代码>{void *mem = malloc(1024+15);void *ptr = ((uintptr_t)mem+15) &~ (uintptr_t)0x0F;memset_16aligned(ptr, 0, 1024);免费(内存);}

按要求解释

第一步是分配足够的空闲空间,以防万一.由于内存必须是 16 字节对齐的(意味着前导字节地址需要是 16 的倍数),因此添加 16 个额外字节可以保证我们有足够的空间.在前 16 个字节的某处,有一个 16 字节对齐的指针.(请注意,malloc() 应该返回一个指针,该指针对于any 目的而言足够对齐.但是,any"的含义主要用于诸如基本类型之类的东西— longdoublelong doublelong long,以及指向对象的指针和指向函数的指针.当你正在做更专业的事情,比如玩图形系统,他们可能需要比系统的其他部分更严格的对齐——因此问题和答案是这样的.)

下一步是将void指针转换为char指针;尽管如此,您不应该对空指针进行指针运算(并且 GCC 有警告选项会在您滥用它时告诉您).然后将 16 添加到开始指针.假设 malloc() 返回给你一个不可能对齐的指针:0x800001.添加 16 给出 0x800011.现在我想向下舍入到 16 字节的边界——所以我想将最后 4 位重置为 0.0x0F 将最后 4 位设置为 1;因此,~0x0F 将除最后四位之外的所有位都设置为 1.与 0x800011 一起得到 0x800010.您可以迭代其他偏移量并查看相同的算法是否有效.

最后一步,free(),很简单:你总是,而且只是,返回 free() 一个 malloc() 的值calloc()realloc() 返回给您——其他任何事情都是一场灾难.您正确提供了 mem 来保存该值 — 谢谢.免费发布它.

最后,如果您了解系统 malloc 包的内部结构,您可能会猜测它很可能会返回 16 字节对齐的数据(或者它可能是 8 字节对齐的).如果它是 16 字节对齐的,那么您不需要使用这些值.然而,这是狡猾且不可移植的——其他 malloc 包具有不同的最小对齐方式,因此假设一件事做不同的事情会导致核心转储.在广泛的范围内,此解决方案是可移植的.

其他人提到 posix_memalign() 是另一种获得对齐内存的方法;这并非随处可用,但通常可以使用它作为基础来实现.请注意,对齐是 2 的幂很方便;其他对齐方式更混乱.

还有一条注释——这段代码不会检查分配是否成功.

修订

Windows Programmer 指出你不能对指针进行位掩码操作,事实上,GCC(3.4.6 和 4.3.1 测试)确实会抱怨.因此,基本代码的修改版本 - 转换为主程序,如下所示.正如所指出的那样,我还冒昧地增加了 15 个而不是 16 个.我正在使用 uintptr_t,因为 C99 已经存在了足够长的时间,可以在大多数平台上访问.如果不是在 printf() 语句中使用了 PRIXPTR,那么 #include 就足够了> 而不是使用 #include .[此代码包含 CR 指出的修复,它重申了 Bill K 几年前,直到现在我才设法忽略.]

#include #include #include #include #include static void memset_16aligned(void *space, char byte, size_t nbytes){断言((nbytes & 0x0F) == 0);断言(((uintptr_t)空间& 0x0F)== 0);memset(空间,字节,nbytes);//不是 memset() 的自定义实现}int main(void){void *mem = malloc(1024+15);void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F);printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "
", (uintptr_t)mem, (uintptr_t)ptr);memset_16aligned(ptr, 0, 1024);免费(内存);返回(0);}

这是一个稍微更通用的版本,适用于 2 的幂的大小:

#include #include #include #include #include static void memset_16aligned(void *space, char byte, size_t nbytes){断言((nbytes & 0x0F) == 0);断言(((uintptr_t)空间& 0x0F)== 0);memset(空间,字节,nbytes);//不是 memset() 的自定义实现}静态无效 test_mask(size_t align){uintptr_t 掩码 = ~(uintptr_t)(align - 1);void *mem = malloc(1024+align-1);void *ptr = (void *)(((uintptr_t)mem+align-1) & mask);assert((align & (align - 1)) == 0);printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "
", (uintptr_t)mem, (uintptr_t)ptr);memset_16aligned(ptr, 0, 1024);免费(内存);}int main(void){测试掩码(16);测试掩码(32);测试掩码(64);测试掩码(128);返回(0);}

要将 test_mask() 转换为通用分配函数,分配器的单个返回值必须对释放地址进行编码,正如一些人在他们的回答中所指出的那样.

面试官的问题

Uri 评论:也许我今天早上有 [a] 阅读理解问题,但如果面试问题具体说:您将如何分配 1024 字节的内存"并且您显然分配了更多.这不会是面试官的自动失败吗?

我的回复不适合 300 个字符的评论...

这取决于,我想.我想大多数人(包括我)都认为这个问题的意思是你将如何分配一个空间,其中可以存储 1024 字节的数据,并且基地址是 16 字节的倍数".如果面试官的意思是你怎么分配 1024 字节(只)​​并让它 16 字节对齐,那么选项就更有限了.

  • 显然,一种可能性是分配 1024 个字节,然后对该地址进行对齐处理";这种方法的问题是实际可用空间没有正确确定(可用空间在 1008 到 1024 字节之间,但没有可用的机制来指定大小),这使得它不太有用.
  • 另一种可能性是,您需要编写一个完整的内存分配器,并确保您返回的 1024 字节块适当对齐.如果是这种情况,您可能最终会执行与建议的解决方案非常相似的操作,但您将其隐藏在分配器中.

但是,如果面试官期望这些回答中的任何一个,我希望他们认识到这个解决方案回答了一个密切相关的问题,然后重新构建他们的问题以将对话指向正确的方向.(此外,如果面试官真的脾气暴躁,那么我就不会想要这份工作;如果对不够精确的要求的答案未经修正就被扑灭了,那么面试官就不是可以安全工作的人.)

世界继续前进

问题的标题最近发生了变化.解决C面试题中的内存对齐问题.修改后的标题(如何仅使用标准库分配对齐的内存?)需要稍微修改一下答案——本附录提供了它.

C11 (ISO/IEC 9899:2011) 添加函数 aligned_alloc():

<块引用>

7.22.3.1 aligned_alloc 函数

概要

#include void *aligned_alloc(size_t 对齐,size_t 大小);

说明
aligned_alloc 函数为对齐为的对象分配空间由 alignment 指定,其大小由 size 指定,其值为不定.alignment 的值应为实现支持的有效对齐方式,size 的值应为 alignment 的整数倍.

退货
aligned_alloc 函数返回一个空指针或一个指向分配空间的指针.

并且 POSIX 定义了 posix_memalign():

<块引用>

#include int posix_memalign(void **memptr, size_t 对齐, size_t 大小);

描述

posix_memalign() 函数应分配 size 字节对齐在由 alignment 指定的边界上,并返回指向分配内存的指针在 memptr 中.alignment 的值应该是 sizeof(void *) 的 2 的幂.

成功完成后,memptr指向的值应该是alignment的倍数.

如果请求的空间大小为0,则行为是实现定义的;memptr 中返回的值应为空指针或唯一指针.

free() 函数将释放之前由 posix_memalign() 分配的内存.

返回值

成功完成后,posix_memalign() 应返回零;否则,返回错误编号指示错误.

现在可以使用其中一个或两个来回答问题,但最初回答问题时只有 POSIX 函数是一个选项.

在幕后,新的对齐内存功能与问题中概述的工作大致相同,只是它们能够更轻松地强制对齐,并在内部跟踪对齐内存的开始,以便代码不需要特别处理——它只是释放被使用的分配函数返回的内存.

I just finished a test as part of a job interview, and one question stumped me, even using Google for reference. I'd like to see what the StackOverflow crew can do with it:

The memset_16aligned function requires a 16-byte aligned pointer passed to it, or it will crash.

a) How would you allocate 1024 bytes of memory, and align it to a 16 byte boundary?
b) Free the memory after the memset_16aligned has executed.

{    
   void *mem;
   void *ptr;

   // answer a) here

   memset_16aligned(ptr, 0, 1024);

   // answer b) here    
}

解决方案

Original answer

{
    void *mem = malloc(1024+16);
    void *ptr = ((char *)mem+16) & ~ 0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

Fixed answer

{
    void *mem = malloc(1024+15);
    void *ptr = ((uintptr_t)mem+15) & ~ (uintptr_t)0x0F;
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

Explanation as requested

The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc() is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types — long, double, long double, long long, and pointers to objects and pointers to functions. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system — hence questions and answers like this.)

The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc() returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary — so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~0x0F has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works.

The last step, free(), is easy: you always, and only, return to free() a value that one of malloc(), calloc() or realloc() returned to you — anything else is a disaster. You correctly provided mem to hold that value — thank you. The free releases it.

Finally, if you know about the internals of your system's malloc package, you could guess that it might well return 16-byte aligned data (or it might be 8-byte aligned). If it was 16-byte aligned, then you'd not need to dink with the values. However, this is dodgy and non-portable — other malloc packages have different minimum alignments, and therefore assuming one thing when it does something different would lead to core dumps. Within broad limits, this solution is portable.

Someone else mentioned posix_memalign() as another way to get the aligned memory; that isn't available everywhere, but could often be implemented using this as a basis. Note that it was convenient that the alignment was a power of 2; other alignments are messier.

One more comment — this code does not check that the allocation succeeded.

Amendment

Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code — converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR in the printf() statements, it would be sufficient to #include <stdint.h> instead of using #include <inttypes.h>. [This code includes the fix pointed out by C.R., which was reiterating a point first made by Bill K a number of years ago, which I managed to overlook until now.]

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

int main(void)
{
    void *mem = malloc(1024+15);
    void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "
", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
    return(0);
}

And here is a marginally more generalized version, which will work for sizes which are a power of 2:

#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void memset_16aligned(void *space, char byte, size_t nbytes)
{
    assert((nbytes & 0x0F) == 0);
    assert(((uintptr_t)space & 0x0F) == 0);
    memset(space, byte, nbytes);  // Not a custom implementation of memset()
}

static void test_mask(size_t align)
{
    uintptr_t mask = ~(uintptr_t)(align - 1);
    void *mem = malloc(1024+align-1);
    void *ptr = (void *)(((uintptr_t)mem+align-1) & mask);
    assert((align & (align - 1)) == 0);
    printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "
", (uintptr_t)mem, (uintptr_t)ptr);
    memset_16aligned(ptr, 0, 1024);
    free(mem);
}

int main(void)
{
    test_mask(16);
    test_mask(32);
    test_mask(64);
    test_mask(128);
    return(0);
}

To convert test_mask() into a general purpose allocation function, the single return value from the allocator would have to encode the release address, as several people have indicated in their answers.

Problems with interviewers

Uri commented: Maybe I am having [a] reading comprehension problem this morning, but if the interview question specifically says: "How would you allocate 1024 bytes of memory" and you clearly allocate more than that. Wouldn't that be an automatic failure from the interviewer?

My response won't fit into a 300-character comment...

It depends, I suppose. I think most people (including me) took the question to mean "How would you allocate a space in which 1024 bytes of data can be stored, and where the base address is a multiple of 16 bytes". If the interviewer really meant how can you allocate 1024 bytes (only) and have it 16-byte aligned, then the options are more limited.

  • Clearly, one possibility is to allocate 1024 bytes and then give that address the 'alignment treatment'; the problem with that approach is that the actual available space is not properly determinate (the usable space is between 1008 and 1024 bytes, but there wasn't a mechanism available to specify which size), which renders it less than useful.
  • Another possibility is that you are expected to write a full memory allocator and ensure that the 1024-byte block you return is appropriately aligned. If that is the case, you probably end up doing an operation fairly similar to what the proposed solution did, but you hide it inside the allocator.

However, if the interviewer expected either of those responses, I'd expect them to recognize that this solution answers a closely related question, and then to reframe their question to point the conversation in the correct direction. (Further, if the interviewer got really stroppy, then I wouldn't want the job; if the answer to an insufficiently precise requirement is shot down in flames without correction, then the interviewer is not someone for whom it is safe to work.)

The world moves on

The title of the question has changed recently. It was Solve the memory alignment in C interview question that stumped me. The revised title (How to allocate aligned memory only using the standard library?) demands a slightly revised answer — this addendum provides it.

C11 (ISO/IEC 9899:2011) added function aligned_alloc():

7.22.3.1 The aligned_alloc function

Synopsis

#include <stdlib.h>
void *aligned_alloc(size_t alignment, size_t size);

Description
The aligned_alloc function allocates space for an object whose alignment is specified by alignment, whose size is specified by size, and whose value is indeterminate. The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment.

Returns
The aligned_alloc function returns either a null pointer or a pointer to the allocated space.

And POSIX defines posix_memalign():

#include <stdlib.h>

int posix_memalign(void **memptr, size_t alignment, size_t size);

DESCRIPTION

The posix_memalign() function shall allocate size bytes aligned on a boundary specified by alignment, and shall return a pointer to the allocated memory in memptr. The value of alignment shall be a power of two multiple of sizeof(void *).

Upon successful completion, the value pointed to by memptr shall be a multiple of alignment.

If the size of the space requested is 0, the behavior is implementation-defined; the value returned in memptr shall be either a null pointer or a unique pointer.

The free() function shall deallocate memory that has previously been allocated by posix_memalign().

RETURN VALUE

Upon successful completion, posix_memalign() shall return zero; otherwise, an error number shall be returned to indicate the error.

Either or both of these could be used to answer the question now, but only the POSIX function was an option when the question was originally answered.

Behind the scenes, the new aligned memory function do much the same job as outlined in the question, except they have the ability to force the alignment more easily, and keep track of the start of the aligned memory internally so that the code doesn't have to deal with specially — it just frees the memory returned by the allocation function that was used.

这篇关于如何仅使用标准库分配对齐的内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆