HashMap和Vec之间的str的共享所有权 [英] Shared ownership of an str between a HashMap and a Vec
问题描述
我来自Java / C#/ JavaScript背景,我试图实现一个 Dictionary
,它将为每个传递的字符串分配一个永不改变的id。字典应该能够通过指定的ID返回一个字符串。这允许在文件系统中存储一些具有大量重复字符串的数据,因为只会存储字符串的id而不是整个字符串。
I认为一个带有 HashMap
和一个 Vec
的结构可以做,但事实证明它比这更复杂。 / p>
我开始使用& str
作为 HashMap
和一个 Vec
的项目,如下例所示。 HashMap
的值作为 Vec
的索引。
pub结构词典<'a> {
values_map:HashMap<&'a str,u32>,
keys_map:Vec<&'a str>
}
impl<'a>字典<一> {
pub fn put_and_get_key(& mut self,value:&'a str) - > u32 {
match self.values_map.get_mut(value){
None => {
let id_usize = self.keys_map.len();
让id = id_usize为u32;
self.keys_map.push(value);
self.values_map.insert(value,id);
id
},
一些(& mut id)=> id
}
}
}
事实证明, str
需要存储在某个地方,最好在同一个 struct
中。我尝试在 Vec
和&'一个str中存储
Box< str>
HashMap
。
pub结构字典<'一> {
values_map:HashMap<&'a str,u32>,
keys_map:Vec< Box< str>>
借用检查器当然不允许这样做,因为它会允许一个悬挂当一个项目从 Vec
中被移除时(或者实际上有时当另一个项目被添加到该项目时, HashMap
Vec
但这里是一个焦点话题。)
我知道我需要写不安全
代码或使用某种形式的共享所有权,其中最简单的一种似乎是 Rc
。使用 Rc< Box< str>>
看起来像引入了双重间接,但似乎没有简单的方法来构造 Rc< str>
pub结构字典{
values_map:HashMap< Rc< Box< ; str>> ;, u32>,
keys_map:Vec< Rc< Box< str>>>
}
impl字典{
pub fn put_and_get_key(& mut self,value:& str) - > u32 {
match self.values_map.get_mut(value){
None => {
let id_usize = self.keys_map.len();
让id = id_usize为u32;
let value_to_store = Rc :: new(value.to_owned()。into_boxed_str());
self.keys_map.push(value_to_store);
self.values_map.insert(value_to_store,id);
id
},
一些(& mut id)=> id
}
}
}
到所有权语义,但是上面的代码不能编译,因为 HashMap
现在需要一个 Rc
,而不是:
$ b
error [E0277]:不符合`std :: rc :: Rc< Box< str>>:std :: borrow :: Borrow< str>`的特征
- > src / file_structure / sample_dictionary.rs:14:31
|
14 |匹配self.values_map.get_mut(值){
| ^ std :: rc :: Rc< Box< str>>`
|
= help:找到以下实现:
= help:< std :: rc :: Rc< T> as std :: borrow :: Borrow< T>>
问题:
- 有没有办法构建
Rc< str>
? - 哪些其他结构,方法或方法可以帮助解决这个问题。从本质上讲,我需要一种方法来有效地存储两个映射
字符串的id
和id-by-string
并且是能够通过& str
检索一个id,即没有任何过多的分配。
有没有办法构建
Rc< str>
?
令人烦恼的是,并不是我所知道的。 Rc :: new
需要一个 Sized
参数,我不确定它是实际的限制还是只是某种被遗忘了。
哪些其他结构,方法或方法可以帮助解决这个问题?
< blockquote>
如果您查看
get
您会发现:
< code>> fn< Q:?Sized>(& self,k:& Q) - >选项<&安培; V>
其中K:借贷,Q:哈希+ Eq
结果,如果
K
实现借用< str> $ c,您可以通过
& str
进行搜索$ c $。
$ b
字符串
执行借用< str>
,所以最简单的解决方法是简单地使用String
作为关键字。当然,这意味着你实际上会有两个String
而不是一个...但这很简单。当然,一个String
比Box< str>
更简单(尽管它再多用8个字节)。
如果您想削减这笔费用,您可以使用自定义结构:
#[derive(Clone,Debug)]
struct RcStr(Rc< String>);
然后执行
借用< str>
它。然后,您将为每个密钥分配2次(1为Rc
,1为String
)。根据你的String
的大小,它可能会占用更少或更多的内存。
如果你想进一步(为什么不?),这里有一些想法:
- 实现自己的引用计数字符串,在单个堆分配中,
- 在插入到
Dictionary
,
- ...
I come from a Java/C#/JavaScript background and I am trying to implement a
Dictionary
that would assign each passed string an id that never changes. The dictionary should be able to return a string by the specified id. This allows to store some data that has a lot of repetitive strings far more efficiently in the file system because only the ids of strings would be stored instead of entire strings.I thought that a struct with a
HashMap
and aVec
would do but it turned out to be more complicated than that.I started with the usage of
&str
as a key forHashMap
and an item ofVec
like in the following sample. The value ofHashMap
serves as an index intoVec
.pub struct Dictionary<'a> { values_map: HashMap<&'a str, u32>, keys_map: Vec<&'a str> } impl<'a> Dictionary<'a> { pub fn put_and_get_key(&mut self, value: &'a str) -> u32 { match self.values_map.get_mut(value) { None => { let id_usize = self.keys_map.len(); let id = id_usize as u32; self.keys_map.push(value); self.values_map.insert(value, id); id }, Some(&mut id) => id } } }
This works just fine until it turns out that the
str
s need to be stored somewhere, preferably in this samestruct
as well. I tried to store aBox<str>
in theVec
and&'a str
in theHashMap
.pub struct Dictionary<'a> { values_map: HashMap<&'a str, u32>, keys_map: Vec<Box<str>> }
The borrow checker did not allow this of course because it would have allowed a dangling pointer in the
HashMap
when an item is removed from theVec
(or in fact sometimes when another item is added to theVec
but this is an off-topic here).I understood that I either need to write
unsafe
code or use some form of shared ownership, the simplest kind of which seems to be anRc
. The usage ofRc<Box<str>>
looks like introducing double indirection but there seems to be no simple way to construct anRc<str>
at the moment.pub struct Dictionary { values_map: HashMap<Rc<Box<str>>, u32>, keys_map: Vec<Rc<Box<str>>> } impl Dictionary { pub fn put_and_get_key(&mut self, value: &str) -> u32 { match self.values_map.get_mut(value) { None => { let id_usize = self.keys_map.len(); let id = id_usize as u32; let value_to_store = Rc::new(value.to_owned().into_boxed_str()); self.keys_map.push(value_to_store); self.values_map.insert(value_to_store, id); id }, Some(&mut id) => id } } }
Everything seems fine with regard to ownership semantics, but the code above does not compile because the
HashMap
now expects anRc
, not an&str
:error[E0277]: the trait bound `std::rc::Rc<Box<str>>: std::borrow::Borrow<str>` is not satisfied --> src/file_structure/sample_dictionary.rs:14:31 | 14 | match self.values_map.get_mut(value) { | ^^^^^^^ the trait `std::borrow::Borrow<str>` is not implemented for `std::rc::Rc<Box<str>>` | = help: the following implementations were found: = help: <std::rc::Rc<T> as std::borrow::Borrow<T>>
Questions:
- Is there a way to construct an
Rc<str>
?- Which other structures, methods or approaches could help to resolve this problem. Essentially, I need a way to efficiently store two maps
string-by-id
andid-by-string
and be able to retrieve an id by&str
, i.e. without any excessive allocations.
解决方案Is there a way to construct an
Rc<str>
?Annoyingly, not that I know of.
Rc::new
requires aSized
argument, and I am not sure whether it is an actual limitation, or just something which was forgotten.Which other structures, methods or approaches could help to resolve this problem?
If you look at the signature of
get
you'll notice:fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> where K: Borrow<Q>, Q: Hash + Eq
As a result, you could search by
&str
ifK
implementsBorrow<str>
.
String
implementsBorrow<str>
, so the simplest solution is to simply useString
as a key. Sure it means you'll actually have twoString
instead of one... but it's simple. Certainly, aString
is simpler to use than aBox<str>
(although it uses 8 more bytes).If you want to shave off this cost, you can use a custom structure instead:
#[derive(Clone, Debug)] struct RcStr(Rc<String>);
And then implement
Borrow<str>
for it. You'll then have 2 allocations per key (1 forRc
and 1 forString
). Depending on the size of yourString
, it might consume less or more memory.
If you wish to got further (why not?), here are some ideas:
- implement your own reference-counted string, in a single heap-allocation,
- use a single arena for the slice inserted in the
Dictionary
,- ...
这篇关于HashMap和Vec之间的str的共享所有权的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!