C-两个指针之间的转换行为 [英] C - Conversion behavior between two pointers

查看:71
本文介绍了C-两个指针之间的转换行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新2020-12-11::谢谢@"Some Programme dude".对于评论中的建议.我的根本问题是我们的团队正在实现动态类型的存储引擎.我们为 16对齐分配了多个char数组[PAGE_SIZE]缓冲区,以存储动态类型的数据(没有固定的结构).出于效率原因,我们无法执行字节编码或分配额外的空间来使用 memcpy .

Update 2020-12-11: Thanks @"Some programmer dude" for the suggestion in the comment. My underlying problem is that our team is implementing a dynamic type storage engine. We allocate multiple char array[PAGE_SIZE] buffers with 16-aligned to store dynamic types of data (there is no fixed struct). For efficiency reasons, we cannot perform byte encoding or allocate additional space to use memcpy.

由于确定了对齐方式(即16),剩下的就是使用指针强制转换来访问指定类型的对象,例如:

Since the alignment has been determined (i.e., 16), the rest is to use the cast of pointer to access objects of the specified type, for example:

int main() {
    // simulate our 16-aligned malloc
    _Alignas(16) char buf[4096];

    // store some dynamic data:
    *((unsigned long *) buf) = 0xff07;
    *(((double *) buf) + 2) = 1.618;
}

但是我们的团队对此操作是否属于不确定行为表示怀疑.

But our team disputes whether this operation is undefined behavior.

我已经阅读了许多类似的问题,例如

I have read many similar questions, such as

  • 为什么-Wcast-align不警告在x86上从char *转换为int *?
  • 如何将char数组转换为int在不对齐的位置上?
  • C未定义的行为.严格的别名规则,还是对齐错误?
  • 但是这些与我对C标准的解释不同,我想知道这是否是我的误解.

    But these are different from my interpretation of the C standard, I want to know if it’s my misunderstanding.

    主要的困惑在于 6.3.2.3#C11中的7 :

    指向对象类型的指针可以转换为指向不同对象类型的指针.如果针对所引用类型的结果指针未正确对齐68),则行为未定义.

    A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined.

    68)通常,正确对齐"的概念是可传递的:如果针对类型B的指针正确地对齐了类型A的指针,而对于类型C的指针又正确地对齐了类型,则指向A类型的指针已与指向C类型的指针正确对齐.

    68) In general, the concept ‘‘correctly aligned’’ is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.

    这里的结果指针是否指的是指针对象指针值?

    我认为答案是指针对象,但是更多答案似乎表明指针值.

    In my opinion, I think the answer is the Pointer Object, but more answers seem to indicate the Pointer Value.

    我的想法如下:指针本身就是一个对象.根据 6.2.5#28 ,不同的指针可能具有不同的表示和对齐要求.因此,根据 6.3.2.3#7 ,只要两个指针具有相同的对齐方式,就可以安全地转换它们而没有未定义的行为,但是不能保证可以将它们取消引用.在程序中表达这个想法:

    My thoughts are as follows: A pointer itself is an object. According to 6.2.5 #28, different pointer may have different representation and alignment requirements. Therefore, according to 6.3.2.3 #7, as long as two pointers have the same alignment, they can be safely converted without undefined behavior, but there is no guarantee that they can be dereferenced. Express this idea in a program:

    #include <stdio.h>
    
    int main() {
        char buf[4096];
    
        char *pc = buf;
        if (_Alignof(char *) == _Alignof(int *)) {
            // cast safely, because they have the same alignment requirement?
            int *pi = (int *) pc; 
            printf("pi: %p\n", pi);
        } else {
            printf("char * and int * don't have the same alignment.\n");
        }
    }
    


    解释B:指针值

    但是,如果C11标准针对的是引用类型的 Pointer Value (指针值)而不是 Pointer Object (指针对象).上面代码的对齐检查是没有意义的.在程序中表达这个想法:


    Interpretation B: Pointer Value

    However, if the C11 standard is talking about Pointer Value for referenced type rather than Pointer Object. The alignment check of the above code is meaningless. Express this idea in a program:

    #include <stdio.h>
    
    int main() {
        char buf[4096];
    
        char *pc = buf;
        
        /*
         * undefined behavior, because:
         * align of char is 1
         * align of int is 4
         * 
         * and we don't know whether the `value` of pc is 4-aligned.
         */
        int *pi = (int *) pc;
        printf("pi: %p\n", pi);
    }
    


    哪种解释正确?


    Which interpretation is correct?

    推荐答案

    解释B是正确的.该标准谈论的是指向对象的指针,而不是对象本身.结果指针"指的是结果指针".是指强制类型转换的结果,并且强制类型转换不会产生左值,因此是指强制类型转换之后的指针值.

    Interpretation B is correct. The standard is talking about a pointer to an object, not the object itself. "Resulting pointer" is referring to the result of the cast, and a cast does not produce an lvalue, so it's referring to the pointer value after the cast.

    以您的示例中的代码为例,假设 int 必须在4个字节的边界上对齐,即它的地址必须是4的倍数.code>为 0x1001 ,然后将该地址转换为 int * 是无效的,因为指针值未正确对齐.如果 buf 的地址为 0x1000 ,则将其转换为 int * 是有效的.

    Taking the code in your example, suppose that an int must be aligned on a 4 byte boundary, i.e. it's address must be a multiple of 4. If the address of buf is 0x1001 then converting that address to int * is invalid because the pointer value is not properly aligned. If the address of buf is 0x1000 then converting it to int * is valid.

    更新:

    您添加的代码解决了对齐问题,因此在这方面很好.但是,它有一个不同的问题:它违反了严格的别名.

    The code you added addresses the alignment issue, so it's fine in that regard. It however has a different issue: it violates strict aliasing.

    您定义的数组包含 char 类型的对象.通过将地址强制转换为其他类型,然后取消引用转换后的类型类型,可以将一种类型的对象作为另一种类型的对象进行访问.C标准不允许这样做.

    The array you defined contains objects of type char. By casting the address to a different type and subsequently dereferencing the converted type type, you're accessing objects of one type as objects of another type. This is not allowed by the C standard.

    尽管术语严格混叠"指的是严格混叠".未在标准中使用,请参见第6.5节第6和第7段中的概念:

    Though the term "strict aliasing" is not used in the standard, the concept is described in section 6.5 paragraphs 6 and 7:

    6 用于访问其存储值的对象的有效类型是该对象的声明类型(如果有的话). 87)如果值通过左值的类型不是字符类型,则类型为左值成为该访问的对象的有效类型对于随后的访问,这些访问不会修改存储的值.如果一个使用 memcpy 将值复制到没有声明类型的对象中或 memmove ,或复制为字符类型数组,然后该访问和针对该对象的修改对象的有效类型后续的不修改值的访问是有效类型复制值的对象的长度(如果有).对所有人对没有声明类型的对象的其他访问,有效对象的类型只是用于访问.

    6 The effective type of an object for an access to its stored value is the declared type of the object, if any.87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.

    7 对象只能由具有以下类型之一的左值表达式访问其存储值: 88)

    7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88)

    • 与对象的有效类型兼容的类型
    • 与对象的有效类型兼容的类型的限定版本,
    • 一种类型,它是与对象的有效类型相对应的有符号或无符号类型,
    • 一种类型,它是与对象的有效类型的限定版本相对应的有符号或无符号类型,
    • 在其成员中包括上述类型之一的集合或联合类型(递归地,子集合或包含的并集),或
    • 一种字符类型.

    ...

    87)分配的对象没有声明的类型.

    87 ) Allocated objects have no declared type.

    88)此列表的目的是指定在哪些情况下对象可能会也可能不会被别名.

    88 ) The intent of this list is to specify those circumstances in which an object may or may not be aliased.

    在您的示例中,您正在 char 对象的顶部编写一个 unsigned long 和一个 double .这些类型都不满足第7段的条件.

    In your example, you're writing an unsigned long and a double on top of char objects. Neither of these types satisfies the conditions of paragraph 7.

    除此之外,此处的指针算法无效:

    In addition to that, the pointer arithmetic here is not valid:

     *(((double *) buf) + 2) = 1.618;
    

    当您将 buf 视为 double 的数组时,则未将其视为.至少,您需要直接在 buf 上执行必要的算术,然后将结果强制转换为最后一个.

    As you're treating buf as an array of double when it is not. At the very least, you would need to perform the necessary arithmetic on buf directly and cast the result at the end.

    那么为什么这对于 char 数组而不是 malloc 返回的缓冲区来说是一个问题?因为从 malloc 返回的内存只有在您存储内容之前才具有 no 有效类型,这就是第6段和脚注87所描述的.

    So why is this a problem for a char array and not a buffer returned by malloc? Because memory returned from malloc has no effective type until you store something in it, which is what paragraph 6 and footnote 87 describe.

    因此,从标准的严格角度来看,您正在执行的操作是未定义的行为.但是,根据您的编译器,您可能可以禁用严格的别名,因此这将起作用.如果您使用的是gcc,则需要传递 -fno-strict-aliasing 标志

    So from a strict point of view of the standard, what you're doing is undefined behavior. But depending on your compiler you may be able to disable strict aliasing so this will work. If you're using gcc, you'll want to pass the -fno-strict-aliasing flag

    这篇关于C-两个指针之间的转换行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆