这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗? [英] Is this the right way to read lines from file and split them into words in Rust?
问题描述
编者注:此代码示例来自 Rust 1.0 之前的版本,在语法上不是有效的 Rust 1.0 代码.此代码的更新版本会产生不同的错误,但答案仍包含有价值的信息.
Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
我已经实现了以下方法来从二维数据结构的文件中返回单词:
I've implemented the following method to return me the words from a file in a 2 dimensional data structure:
fn read_terms() -> Vec<Vec<String>> {
let path = Path::new("terms.txt");
let mut file = BufferedReader::new(File::open(&path));
return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}
这是 Rust 中正确、惯用且有效的方式吗?我想知道 collect()
是否需要经常调用,是否有必要在这里调用 to_string()
来分配内存.也许应该对返回类型进行不同的定义,以便更惯用和高效?
Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect()
needs to be called so often and whether it's necessary to call to_string()
here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?
推荐答案
您可以改为将整个文件作为单个 String
读取,然后构建一个指向内部单词的引用结构:
You could instead read the entire file as a single String
and then build a structure of references that points to the words inside:
use std::io::{self, Read};
use std::fs::File;
fn filename_to_string(s: &str) -> io::Result<String> {
let mut file = File::open(s)?;
let mut s = String::new();
file.read_to_string(&mut s)?;
Ok(s)
}
fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
s.lines().map(|line| {
line.split_whitespace().collect()
}).collect()
}
fn example_use() {
let whole_file = filename_to_string("terms.txt").unwrap();
let wbyl = words_by_line(&whole_file);
println!("{:?}", wbyl)
}
这会以较少的开销读取文件,因为它可以将文件放入单个缓冲区,而使用 BufReader
读取行意味着大量复制和分配,首先进入 BufReader 内部的缓冲区
,然后为每一行放入一个新分配的String
,再为每个单词放入一个新分配的String
.它也将使用更少的内存,因为单个大 String
和引用向量比许多单独的 String
更紧凑.
This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader
implies a lot of copying and allocating, first into the buffer inside BufReader
, and then into a newly allocated String
for each line, and then into a newly allocated the String
for each word. It will also use less memory, because the single large String
and vectors of references are more compact than many individual String
s.
一个缺点是你不能直接返回引用的结构,因为它不能越过持有单个大String
的堆栈帧.在上面的example_use
中,我们必须将大String
放入let
中,以便调用words_by_line
.可以使用不安全的代码并将 String
和引用包装在私有结构中来解决这个问题,但这要复杂得多.
A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String
. In example_use
above, we have to put the large String
into a let
in order to call words_by_line
. It is possible to get around this with unsafe code and wrapping the String
and references in a private struct, but that is much more complicated.
这篇关于这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!