这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗? [英] Is this the right way to read lines from file and split them into words in Rust?

查看:46
本文介绍了这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编者注:此代码示例来自 Rust 1.0 之前的版本,在语法上不是有效的 Rust 1.0 代码.此代码的更新版本会产生不同的错误,但答案仍包含有价值的信息.

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.

我已经实现了以下方法来从二维数据结构的文件中返回单词:

I've implemented the following method to return me the words from a file in a 2 dimensional data structure:

fn read_terms() -> Vec<Vec<String>> {
    let path = Path::new("terms.txt");
    let mut file = BufferedReader::new(File::open(&path));
    return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}

这是 Rust 中正确、惯用且有效的方式吗?我想知道 collect() 是否需要经常调用,是否有必要在这里调用 to_string() 来分配内存.也许应该对返回类型进行不同的定义,以便更惯用和高效?

Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?

推荐答案

您可以改为将整个文件作为单个 String 读取,然后构建一个指向内部单词的引用结构:

You could instead read the entire file as a single String and then build a structure of references that points to the words inside:

use std::io::{self, Read};
use std::fs::File;

fn filename_to_string(s: &str) -> io::Result<String> {
    let mut file = File::open(s)?;
    let mut s = String::new();
    file.read_to_string(&mut s)?;
    Ok(s)
}

fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
    s.lines().map(|line| {
        line.split_whitespace().collect()
    }).collect()
}

fn example_use() {
    let whole_file = filename_to_string("terms.txt").unwrap();
    let wbyl = words_by_line(&whole_file);
    println!("{:?}", wbyl)
}

这会以较少的开销读取文件,因为它可以将文件放入单个缓冲区,而使用 BufReader 读取行意味着大量复制和分配,首先进入 BufReader 内部的缓冲区,然后为每一行放入一个新分配的String,再为每个单词放入一个新分配的String.它也将使用更少的内存,因为单个大 String 和引用向量比许多单独的 String 更紧凑.

This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.

一个缺点是你不能直接返回引用的结构,因为它不能越过持有单个大String的堆栈帧.在上面的example_use 中,我们必须将大String 放入let 中,以便调用words_by_line.可以使用不安全的代码并将 String 和引用包装在私有结构中来解决这个问题,但这要复杂得多.

A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.

这篇关于这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆