如果我已经实现了 Drop,为什么使用 PhantomData 通知编译器结构拥有泛型很有用? [英] Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

查看:30
本文介绍了如果我已经实现了 Drop,为什么使用 PhantomData 通知编译器结构拥有泛型很有用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Rustonomicon 的 PhantomData 指南,有一部分是关于如果 Vec 之类的结构具有 *const T 字段,但没有 PhantomData 会发生什么:

In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:

丢弃检查器将慷慨地确定 Vec 不拥有任何 T 类型的值.这将反过来使它得出结论,它不需要担心 Vec 在其析构函数中丢弃任何 T 来确定丢弃检查的健全性.这反过来又会允许人们使用 Vec 的析构函数来制造不健全的内容.

The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.

什么意思?如果我为结构体实现 Drop 并手动销毁其中的所有 T ,我为什么要关心编译器是否知道我的结构体拥有一些 Ts?

What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?

推荐答案

Vec 中的 PhantomData(通过 Unique<RawVec) 中的;T> 向编译器传达该向量可能拥有 T 的实例,因此该向量可以运行 的析构函数>T 当向量被丢弃时.

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.

深入探讨:我们这里有多种因素:

Deep dive: We have a combination of factors here:

  • 我们有一个 Vec,它有一个 impl Drop(即析构函数实现).

  • We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).

根据RFC的规则1238,这通常意味着 Vec 的实例与 T 内发生的任何生命周期之间的关系,通过要求 内的所有生命周期T 严格来说比向量寿命更长.

Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.

然而,Vec 的析构函数专门选择退出这个语义仅针对那个析构函数(Veccode> 本身)通过使用特殊的不稳定属性(参见 RFC 1238RFC 1327).这允许向量保存与向量本身具有相同生命周期的引用.这被认为是合理的;毕竟,只要有一个重要的警告,向量本身不会取消引用由此类引用指向的数据(它所做的只是删除值和释放后备数组).

However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.

重要的警告:虽然向量本身在自毁时不会取消引用其包含的值中的指针,但它会删除向量持有的值.如果那些 T 类型的值本身有析构函数,那些 T 的析构函数就会运行.如果这些析构函数访问它们的引用中保存的数据,那么如果我们允许这些引用中的悬空指针,我们就会遇到问题.

The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.

所以,更深入地研究:我们确认给定结构 S 的 dropck 有效性的方式,我们首先仔细检查 S 本身是否具有impl Drop for S(如果是这样,我们对 S 强制执行关于其类型参数的规则).但即使在这一步之后,我们仍然会递归地下降S 本身的结构,并根据 dropck 仔细检查其每个字段是否符合 kosher.(请注意,即使 S 的类型参数被标记为 #[may_dangle],我们也会这样做.)

So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)

在这种特定情况下,我们有一个 Vec,它(间接通过 RawVec/Uniquecode>) 拥有一组 T 类型的值,用原始指针 *const T 表示.然而,编译器没有为 *const T 附加所有权语义;结构 S 中单独的字段意味着 ST 之间没有关系,因此强制noST 类型中生命周期关系的约束(至少从 dropck 的角度来看).

In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).

因此,如果 Vec solely 一个 *const T,递归下降到向量将无法捕获向量与包含在向量中的 T 实例之间的所有权关系.这与 T 上的 #[may_dangle] 属性相结合,将导致编译器接受不健全的代码(即 T 的析构函数结束的情况)尝试访问已释放的数据).

Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).

但是:Vec只包含一个*const T.还有一个PhantomData那个向编译器传达嘿,即使你可以假设(由于#[may_dangle] T) 当向量被删除时,Vec 的析构函数不会访问 T 的数据,仍然 某些析构函数T 本身将在向量被删除时访问T的数据."

BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."

最终效果:给定Vec,如果T没有析构函数,那么编译器会为你提供更多灵活性(即,它允许向量保存数据,并引用与向量本身存在相同时间的数据,即使这些数据可能在向量之前被拆除).但是如果 T 确实 有一个析构函数(并且该析构函数没有以其他方式与编译器通信它不会访问任何引用的数据),那么编译器就更加严格,要求任何引用的数据严格超过向量(从而确保当 T 的析构函数运行时,所有引用的数据仍然有效).

The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).

如果想通过具体探索来理解这一点,您可以尝试比较编译器在处理小容器类型方面的不同之处,这些容器类型在使用 #[may_dangle] 方面有所不同>PhantomData.

If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.

这是我为说明这一点而编写的一些示例代码:

Here is some sample code I have whipped up to illustrate this:

// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case, 
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)

#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]

extern crate alloc;

use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;

#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }

#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);

impl<T: fmt::Debug> PrintOnDrop<T> {
    fn new(name: &'static str, t: T) -> Self {
        PrintOnDrop(name, t, State::Valid)
    }
}

impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
    fn drop(&mut self) {
        println!("drop PrintOnDrop({}, {:?}, {:?})",
                 self.0,
                 self.1,
                 self.2);
        self.2 = State::INVALID;
    }
}

struct MyBox1<T> {
    v: Box<T>,
}

impl<T> MyBox1<T> {
    fn new(t: T) -> Self {
        MyBox1 { v: Box::new(t) }
    }
}

struct MyBox2<T> {
    v: *const T,
}

impl<T> MyBox2<T> {
    fn new(t: T) -> Self {
        unsafe {
            let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
            let p = p as *mut T;
            ptr::write(p, t);
            MyBox2 { v: p }
        }
    }
}

unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
    fn drop(&mut self) {
        unsafe {
            // We want this to be *legal*. This destructor is not 
            // allowed to call methods on `T` (since it may be in
            // an invalid state), but it should be allowed to drop
            // instances of `T` as it deconstructs itself.
            //
            // (Note however that the compiler has no knowledge
            //  that `MyBox2<T>` owns an instance of `T`.)
            ptr::read(self.v);
            heap::deallocate(self.v as *mut u8,
                             mem::size_of::<T>(),
                             mem::align_of::<T>());
        }
    }
}

struct MyBox3<T> {
    v: *const T,
    _pd: PhantomData<T>,
}

impl<T> MyBox3<T> {
    fn new(t: T) -> Self {
        unsafe {
            let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
            let p = p as *mut T;
            ptr::write(p, t);
            MyBox3 { v: p, _pd: Default::default() }
        }
    }
}

unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
    fn drop(&mut self) {
        unsafe {
            ptr::read(self.v);
            heap::deallocate(self.v as *mut u8,
                             mem::size_of::<T>(),
                             mem::align_of::<T>());
        }
    }
}

fn f1() {
    // `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
    let v1; let _mb1;
    v1 = PrintOnDrop::new("v1", 13);
    _mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}

fn f2() {
    {
        let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
        v2a = PrintOnDrop::new("v2a", 13);
        _mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
    }

    {
        let (_mb2b, v2b); // Unsound!
        v2b = PrintOnDrop::new("v2b", 13);
        _mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
        // namely, v2b dropped before _mb2b, but latter contains
        // value that attempts to access v2b when being dropped.
    }
}

fn f3() {
    let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
    v3 = PrintOnDrop::new("v3", 13);
    _mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}

fn main() {
    f1(); f2(); f3();
}

这篇关于如果我已经实现了 Drop,为什么使用 PhantomData 通知编译器结构拥有泛型很有用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆