HashMap和Vec之间的str的共享所有权 [英] Shared ownership of an str between a HashMap and a Vec

查看:123
本文介绍了HashMap和Vec之间的str的共享所有权的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我来自Java / C#/ JavaScript背景,我试图实现一个 Dictionary ,它将为每个传递的字符串分配一个永不改变的id。字典应该能够通过指定的ID返回一个字符串。这允许在文件系统中存储一些具有大量重复字符串的数据,因为只会存储字符串的id而不是整个字符串。



I认为一个带有 HashMap 和一个 Vec 的结构可以做,但事实证明它比这更复杂。 / p>

我开始使用& str 作为 HashMap 和一个 Vec 的项目,如下例所示。 HashMap 的值作为 Vec 的索引。

  pub结构词典<'a> {
values_map:HashMap<&'a str,u32>,
keys_map:Vec<&'a str>
}

impl<'a>字典<一> {
pub fn put_and_get_key(& mut self,value:&'a str) - > u32 {
match self.values_map.get_mut(value){
None => {
let id_usize = self.keys_map.len();
让id = id_usize为u32;
self.keys_map.push(value);
self.values_map.insert(value,id);
id
},
一些(& mut id)=> id
}
}
}

事实证明, str 需要存储在某个地方,最好在同一个 struct 中。我尝试在 Vec &'一个str中存储 Box< str> HashMap

  pub结构字典<'一> {
values_map:HashMap<&'a str,u32>,
keys_map:Vec< Box< str>>

借用检查器当然不允许这样做,因为它会允许一个悬挂当一个项目从 Vec 中被移除时(或者实际上有时当另一个项目被添加到该项目时, HashMap Vec 但这里是一个焦点话题。)



我知道我需要写不安全代码或使用某种形式的共享所有权,其中最简单的一种似乎是 Rc 。使用 Rc< Box< str>> 看起来像引入了双重间接,但似乎没有简单的方法来构造 Rc< str>



  pub结构字典{
values_map:HashMap< Rc< Box< ; str>> ;, u32>,
keys_map:Vec< Rc< Box< str>>>
}

impl字典{
pub fn put_and_get_key(& mut self,value:& str) - > u32 {
match self.values_map.get_mut(value){
None => {
let id_usize = self.keys_map.len();
让id = id_usize为u32;
let value_to_store = Rc :: new(value.to_owned()。into_boxed_str());
self.keys_map.push(value_to_store);
self.values_map.insert(value_to_store,id);
id
},
一些(& mut id)=> id
}
}
}

到所有权语义,但是上面的代码不能编译,因为 HashMap 现在需要一个 Rc ,而不是


$ b

  error [E0277]:不符合`std :: rc :: Rc< Box< str>>:std :: borrow :: Borrow< str>`的特征
- > src / file_structure / sample_dictionary.rs:14:31
|
14 |匹配self.values_map.get_mut(值){
| ^ std :: rc :: Rc< Box< str>>`
|
= help:找到以下实现:
= help:< std :: rc :: Rc< T> as std :: borrow :: Borrow< T>>

问题:


  1. 有没有办法构建 Rc< str>

  2. 哪些其他结构,方法或方法可以帮助解决这个问题。从本质上讲,我需要一种方法来有效地存储两个映射字符串的id id-by-string 并且是能够通过& str 检索一个id,即没有任何过多的分配。



有没有办法构建 Rc< str>


令人烦恼的是,并不是我所知道的。 Rc :: new 需要一个 Sized 参数,我不确定它是实际的限制还是只是某种被遗忘了。


哪些其他结构,方法或方法可以帮助解决这个问题?
< blockquote>

如果您查看 get 您会发现:

< code>> fn< Q:?Sized>(& self,k:& Q) - >选项<&安培; V> 
其中K:借贷,Q:哈希+ Eq

结果,如果 K 实现借用< str> & str 进行搜索$ c $。
$ b

字符串执行借用< str> ,所以最简单的解决方法是简单地使用 String 作为关键字。当然,这意味着你实际上会有两个 String 而不是一个...但这很简单。当然,一个 String Box< str> 更简单(尽管它再多用8个字节)。



如果您想削减这笔费用,您可以使用自定义结构:

 #[derive(Clone,Debug)] 
struct RcStr(Rc< String>);

然后执行借用< str> 它。然后,您将为每个密钥分配2次(1为 Rc ,1为 String )。根据你的 String 的大小,它可能会占用更少或更多的内存。






如果你想进一步(为什么不?),这里有一些想法:


  • 实现自己的引用计数字符串,在单个堆分配中,
  • 在插入到 Dictionary

  • ...


I come from a Java/C#/JavaScript background and I am trying to implement a Dictionary that would assign each passed string an id that never changes. The dictionary should be able to return a string by the specified id. This allows to store some data that has a lot of repetitive strings far more efficiently in the file system because only the ids of strings would be stored instead of entire strings.

I thought that a struct with a HashMap and a Vec would do but it turned out to be more complicated than that.

I started with the usage of &str as a key for HashMap and an item of Vec like in the following sample. The value of HashMap serves as an index into Vec.

pub struct Dictionary<'a> {
    values_map: HashMap<&'a str, u32>,
    keys_map: Vec<&'a str>
}

impl<'a> Dictionary<'a> {
    pub fn put_and_get_key(&mut self, value: &'a str) -> u32 {
        match self.values_map.get_mut(value) {
            None => {
                let id_usize = self.keys_map.len();
                let id = id_usize as u32;
                self.keys_map.push(value);
                self.values_map.insert(value, id);
                id
            },
            Some(&mut id) => id
        }
    }
}

This works just fine until it turns out that the strs need to be stored somewhere, preferably in this same struct as well. I tried to store a Box<str> in the Vec and &'a str in the HashMap.

pub struct Dictionary<'a> {
    values_map: HashMap<&'a str, u32>,
    keys_map: Vec<Box<str>>
}

The borrow checker did not allow this of course because it would have allowed a dangling pointer in the HashMap when an item is removed from the Vec (or in fact sometimes when another item is added to the Vec but this is an off-topic here).

I understood that I either need to write unsafe code or use some form of shared ownership, the simplest kind of which seems to be an Rc. The usage of Rc<Box<str>> looks like introducing double indirection but there seems to be no simple way to construct an Rc<str> at the moment.

pub struct Dictionary {
    values_map: HashMap<Rc<Box<str>>, u32>,
    keys_map: Vec<Rc<Box<str>>>
}

impl Dictionary {
    pub fn put_and_get_key(&mut self, value: &str) -> u32 {
        match self.values_map.get_mut(value) {
            None => {
                let id_usize = self.keys_map.len();
                let id = id_usize as u32;
                let value_to_store = Rc::new(value.to_owned().into_boxed_str());
                self.keys_map.push(value_to_store);
                self.values_map.insert(value_to_store, id);
                id
            },
            Some(&mut id) => id
        }
    }
}

Everything seems fine with regard to ownership semantics, but the code above does not compile because the HashMap now expects an Rc, not an &str:

error[E0277]: the trait bound `std::rc::Rc<Box<str>>: std::borrow::Borrow<str>` is not satisfied
  --> src/file_structure/sample_dictionary.rs:14:31
   |
14 |         match self.values_map.get_mut(value) {
   |                               ^^^^^^^ the trait `std::borrow::Borrow<str>` is not implemented for `std::rc::Rc<Box<str>>`
   |
   = help: the following implementations were found:
   = help:   <std::rc::Rc<T> as std::borrow::Borrow<T>>

Questions:

  1. Is there a way to construct an Rc<str>?
  2. Which other structures, methods or approaches could help to resolve this problem. Essentially, I need a way to efficiently store two maps string-by-id and id-by-string and be able to retrieve an id by &str, i.e. without any excessive allocations.

解决方案

Is there a way to construct an Rc<str>?

Annoyingly, not that I know of. Rc::new requires a Sized argument, and I am not sure whether it is an actual limitation, or just something which was forgotten.

Which other structures, methods or approaches could help to resolve this problem?

If you look at the signature of get you'll notice:

fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
    where K: Borrow<Q>, Q: Hash + Eq

As a result, you could search by &str if K implements Borrow<str>.

String implements Borrow<str>, so the simplest solution is to simply use String as a key. Sure it means you'll actually have two String instead of one... but it's simple. Certainly, a String is simpler to use than a Box<str> (although it uses 8 more bytes).

If you want to shave off this cost, you can use a custom structure instead:

#[derive(Clone, Debug)]
struct RcStr(Rc<String>);

And then implement Borrow<str> for it. You'll then have 2 allocations per key (1 for Rc and 1 for String). Depending on the size of your String, it might consume less or more memory.


If you wish to got further (why not?), here are some ideas:

  • implement your own reference-counted string, in a single heap-allocation,
  • use a single arena for the slice inserted in the Dictionary,
  • ...

这篇关于HashMap和Vec之间的str的共享所有权的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆