取消引用原始指针的语义是什么? [英] What are the semantics for dereferencing raw pointers?

查看:53
本文介绍了取消引用原始指针的语义是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于共享引用和可变引用,语义很明确:如只要你有一个对一个值的共享引用,其他任何东西都必须有可变访问,并且不能共享可变引用.

所以这段代码:

#[no_mangle]pub extern fn run_ref(a: &i32, b: &mut i32) ->(i32, i32) {让 x = *a;*b = 1;让 y = *a;(x, y)}

编译(在 x86_64 上):

run_ref:movl (%rdi), %ecxmovl $1, (%rsi)movq %rcx, %raxshlq 32 美元,%raxorq %rcx, %rax回复

注意a指向的内存只能读取一次,因为编译器知道写入 b 不能修改内存a.

原始指针更复杂.原始指针算术和强制转换是安全",但取消引用它们不是.

我们可以将原始指针转换回共享和可变引用,并且然后使用它们;这肯定意味着通常的参考语义,并且编译器可以相应地进行优化.

但是如果我们直接使用原始指针,语义是什么?

#[no_mangle]pub unsafe extern fn run_ptr_direct(a: *const i32, b: *mut f32) ->(i32, i32) {让 x = *a;*b = 1.0;让 y = *a;(x, y)}

编译为:

run_ptr_direct:movl (%rdi), %ecxmovl $1065353216, (%rsi)movl (%rdi), %eaxshlq 32 美元,%raxorq %rcx, %rax回复

虽然我们写了一个不同类型的值,但是第二次读还是继续记忆 - 似乎允许以相同的方式调用此函数(或重叠)两个参数的内存位置.换句话说,一个const 原始指针不禁止共存的 mut 原始指针;和有两个 mut 原始指针(可能不同类型)到相同(或重叠)的内存位置.

请注意,正常的优化 C/C++ 编译器会消除第二个阅读(由于严格别名"规则:修改/阅读相同通过不同(不兼容")类型的指针的内存位置是大多数情况下为UB):

struct tuple { int x;输入 y;};extern "C" 元组 run_ptr(int const* a, float* b) {int const x = *a;*b = 1.0;int const y = *a;返回元组{x, y};}

编译为:

run_ptr:movl (%rdi), %eaxmovl $0x3f800000, (%rsi)movq %rax, %rdx萨尔克 32 美元,%rdxorq %rdx, %rax回复

带有 Rust 代码示例的 Playground

带有 C 示例的 godbolt 编译器资源管理器

所以:如果我们直接使用原始指针,语义是什么:它可以用于参考数据重叠?

这应该对是否允许编译器有直接影响通过原始指针重新排序内存访问.

解决方案

这里没有尴尬的严格别名

C++ 严格混叠是木腿上的补丁.C++ 没有任何别名信息,并且没有别名信息会阻止许多优化(如您在此处所述),因此为了恢复一些性能,严格别名已被修补...

不幸的是,严格别名在系统语言中很尴尬,因为重新解释原始内存是系统语言设计的本质.

更不幸的是,它并没有启用那么多优化.例如,从一个数组复制到另一个数组必须假设数组可能重叠.

restrict(来自 C)更有帮助,尽管它一次只适用于一个级别.

<小时>

相反,我们有基于范围的别名分析

Rust 中别名分析的本质是基于词法范围(线程除外).

你可能知道的初级解释是:

  • 如果你有一个 &T,那么同一个实例就没有 &mut T
  • 如果您有 &mut T,那么同一个实例就没有 &T&mut T.

为了适合初学者,它是一个稍微简化的版本.例如:

fn main() {让 mut i = 32;让 mut_ref = &mut i;让 x: &i32 = mut_ref;println!("{}", x);}

非常好,即使 &mut i32 (mut_ref) 和 &i32 (x) 指向同一个实例!

如果你在形成x后尝试访问mut_ref,然而,真相大白:

fn main() {让 mut i = 32;让 mut_ref = &mut i;让 x: &i32 = mut_ref;*mut_ref = 2;println!("{}", x);}

<块引用>

error[E0506]: 不能分配给 `*mut_ref`,因为它是借用的|4 |让 x: &i32 = mut_ref;|------- 借用 `*mut_ref` 出现在这里5 |*mut_ref = 2;|^^^^^^^^^^^^ 对借用的 `*mut_ref` 赋值发生在这里

因此,让 &mut T&T 同时指向相同的内存位置是很好;但是,只要 &T 存在,通过 &mut T 进行的变异将被禁用.

从某种意义上说,&mut T暂时降级为&T.

<小时>

那么,指针呢?

首先,让我们回顾一下参考:

<块引用>
  • 不保证指向有效内存,甚至不保证非空(与 Box& 不同);
  • 没有任何自动清理功能,不像Box,因此需要手动管理资源;
  • 是普通的旧数据,也就是说,它们不会移动所有权,同样与 Box 不同,因此 Rust 编译器无法防止像 use-after-free 这样的错误;
  • 缺少任何形式的生命周期,不像&,因此编译器无法推理悬空指针;和
  • 除了不允许直接通过 *const T 进行突变外,不保证别名或可变性.

显然没有任何规则禁止将 *const T 转换为 *mut T.这是正常的,这是允许的,因此最后一点实际上更像是一个lint,因为它可以很容易地解决.

Nomicon

如果没有指向 Nomicon,关于不安全 Rust 的讨论就不会完整.

本质上,不安全 Rust 的规则相当简单:坚持编译器在它是安全 Rust 的情况下会提供的任何保证.

这并没有多大帮助,因为这些规则还没有一成不变;对不起.

那么,解引用原始指针的语义是什么?

据我所知1:

  • 如果您从原始指针(&T&mut T)形成一个引用,那么您必须确保这些引用遵循的别名规则得到维护,
  • 如果您立即读/写,这会暂时形成参考.

也就是说,前提是调用者可以对位置进行可变访问:

pub unsafe fn run_ptr_direct(a: *const i32, b: *mut f32) ->(i32, i32) {让 x = *a;*b = 1.0;让 y = *a;(x, y)}

应该是有效的,因为*a的类型是i32,所以引用中没有生命周期重叠.

但是,我希望:

pub unsafe fn run_ptr_modified(a: *const i32, b: *mut f32) ->(i32, i32) {让 x = &*a;*b = 1.0;让 y = *a;(*x, y)}

是未定义的行为,因为 x 将处于活动状态,而 *b 用于修改其内存.

注意变化是多么微妙.很容易破坏 unsafe 代码中的不变量.

1 我现在可能错了,或者将来我可能会错

For shared references and mutable references the semantics are clear: as long as you have a shared reference to a value, nothing else must have mutable access, and a mutable reference can't be shared.

So this code:

#[no_mangle]
pub extern fn run_ref(a: &i32, b: &mut i32) -> (i32, i32) {
    let x = *a;
    *b = 1;
    let y = *a;
    (x, y)
}

compiles (on x86_64) to:

run_ref:
    movl    (%rdi), %ecx
    movl    $1, (%rsi)
    movq    %rcx, %rax
    shlq    $32, %rax
    orq     %rcx, %rax
    retq

Note that the memory a points to is only read once, because the compiler knows the write to b must not have modified the memory at a.

Raw pointer are more complicated. Raw pointer arithmetic and casts are "safe", but dereferencing them is not.

We can convert raw pointers back to shared and mutable references, and then use them; this will certainly imply the usual reference semantics, and the compiler can optimize accordingly.

But what are the semantics if we use raw pointers directly?

#[no_mangle]
pub unsafe extern fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
    let x = *a;
    *b = 1.0;
    let y = *a;
    (x, y)
}

compiles to:

run_ptr_direct:
    movl    (%rdi), %ecx
    movl    $1065353216, (%rsi)
    movl    (%rdi), %eax
    shlq    $32, %rax
    orq     %rcx, %rax
    retq

Although we write a value of different type, the second read still goes to memory - it seems to be allowed to call this function with the same (or overlapping) memory location for both arguments. In other words, a const raw pointer does not forbid a coexisting mut raw pointer; and its probably fine to have two mut raw pointers (of possibly different types) to the same (or overlapping) memory location too.

Note that a normal optimizing C/C++-compiler would eliminate the second read (due to the "strict aliasing" rule: modfying/reading the same memory location through pointers of different ("incompatible") types is UB in most cases):

struct tuple { int x; int y; };

extern "C" tuple run_ptr(int const* a, float* b) {
    int const x = *a;
    *b = 1.0;
    int const y = *a;
    return tuple{x, y};
}

compiles to:

run_ptr:
    movl    (%rdi), %eax
    movl    $0x3f800000, (%rsi)
    movq    %rax, %rdx
    salq    $32, %rdx
    orq     %rdx, %rax
    ret

Playground with Rust code examples

godbolt Compiler Explorer with C example

So: What are the semantics if we use raw pointers directly: is it ok for referenced data to overlap?

This should have direct implications on whether the compiler is allowed to reorder memory access through raw pointers.

解决方案

No awkward strict-aliasing here

C++ strict-aliasing is a patch on a wooden leg. C++ does not have any aliasing information, and the absence of aliasing information prevents a number of optimizations (as you noted here), therefore to regain some performance strict-aliasing was patched on...

Unfortunately, strict-aliasing is awkward in a systems language, because reinterpreting raw-memory is the essence of what systems language are designed to do.

And doubly unfortunately it does not enable that many optimizations. For example, copying from one array to another must assume that the arrays may overlap.

restrict (from C) is a bit more helpful, although it only applies to one level at a time.


Instead, we have scope-based aliasing analysis

The essence of the aliasing analysis in Rust is based on lexical scopes (barring threads).

The beginner level explanation that you probably know is:

  • if you have a &T, then there is no &mut T to the same instance,
  • if you have a &mut T, then there is no &T or &mut T to the same instance.

As suited to a beginner, it is a slightly abbreviated version. For example:

fn main() {
    let mut i = 32;
    let mut_ref = &mut i;
    let x: &i32 = mut_ref;

    println!("{}", x);
}

is perfectly fine, even though both a &mut i32 (mut_ref) and a &i32 (x) point to the same instance!

If you try to access mut_ref after forming x, however, the truth is unveiled:

fn main() {
    let mut i = 32;
    let mut_ref = &mut i;
    let x: &i32 = mut_ref;
    *mut_ref = 2;
    println!("{}", x);
}

error[E0506]: cannot assign to `*mut_ref` because it is borrowed
  |
4 |         let x: &i32 = mut_ref;
  |                       ------- borrow of `*mut_ref` occurs here
5 |         *mut_ref = 2;
  |         ^^^^^^^^^^^^ assignment to borrowed `*mut_ref` occurs here

So, it is fine to have both &mut T and &T pointing to the same memory location at the same time; however mutating through the &mut T will be disabled for as long as the &T exists.

In a sense, the &mut T is temporarily downgraded to a &T.


So, what of pointers?

First of all, let's review the reference:

  • are not guaranteed to point to valid memory and are not even guaranteed to be non-NULL (unlike both Box and &);
  • do not have any automatic clean-up, unlike Box, and so require manual resource management;
  • are plain-old-data, that is, they don't move ownership, again unlike Box, hence the Rust compiler cannot protect against bugs like use-after-free;
  • lack any form of lifetimes, unlike &, and so the compiler cannot reason about dangling pointers; and
  • have no guarantees about aliasing or mutability other than mutation not being allowed directly through a *const T.

Conspicuously absent is any rule forbidding from casting a *const T to a *mut T. That's normal, it's allowed, and therefore the last point is really more of a lint, since it can be so easily worked around.

Nomicon

A discussion of unsafe Rust would not be complete without pointing to the Nomicon.

Essentially, the rules of unsafe Rust are rather simple: uphold whatever guarantee the compiler would have if it was safe Rust.

This is not as helpful as it could be, since those rules are not set in stone yet; sorry.

Then, what are the semantics for dereferencing raw pointers?

As far as I know1:

  • if you form a reference from the raw pointer (&T or &mut T) then you must ensure that the aliasing rules these references obey are upheld,
  • if you immediately read/write, this temporarily forms a reference.

That is, providing that the caller had mutable access to the location:

pub unsafe fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
    let x = *a;
    *b = 1.0;
    let y = *a;
    (x, y)
}

should be valid, because *a has type i32, so there is no overlap of lifetime in references.

However, I would expect:

pub unsafe fn run_ptr_modified(a: *const i32, b: *mut f32) -> (i32, i32) {
    let x = &*a;
    *b = 1.0;
    let y = *a;
    (*x, y)
}

To be undefined behavior, because x would be live while *b is used to modify its memory.

Note how subtle the change is. It's easy to break invariants in unsafe code.

1 And I might be wrong right now, or I may become wrong in the future

这篇关于取消引用原始指针的语义是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆