这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗? [英] Is this the right way to read lines from file and split them into words in Rust?

查看：46 发布时间：2021/7/13 20:51:56 rust

本文介绍了这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

编者注:此代码示例来自 Rust 1.0 之前的版本，在语法上不是有效的 Rust 1.0 代码.此代码的更新版本会产生不同的错误，但答案仍包含有价值的信息.

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.

我已经实现了以下方法来从二维数据结构的文件中返回单词:

I've implemented the following method to return me the words from a file in a 2 dimensional data structure:

fn read_terms() -> Vec<Vec<String>> {
    let path = Path::new("terms.txt");
    let mut file = BufferedReader::new(File::open(&path));
    return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}

这是 Rust 中正确、惯用且有效的方式吗?我想知道 collect() 是否需要经常调用，是否有必要在这里调用 to_string() 来分配内存.也许应该对返回类型进行不同的定义，以便更惯用和高效?

Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?

推荐答案

您可以改为将整个文件作为单个 String 读取，然后构建一个指向内部单词的引用结构:

You could instead read the entire file as a single String and then build a structure of references that points to the words inside:

use std::io::{self, Read};
use std::fs::File;

fn filename_to_string(s: &str) -> io::Result<String> {
    let mut file = File::open(s)?;
    let mut s = String::new();
    file.read_to_string(&mut s)?;
    Ok(s)
}

fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
    s.lines().map(|line| {
        line.split_whitespace().collect()
    }).collect()
}

fn example_use() {
    let whole_file = filename_to_string("terms.txt").unwrap();
    let wbyl = words_by_line(&whole_file);
    println!("{:?}", wbyl)
}

这会以较少的开销读取文件，因为它可以将文件放入单个缓冲区，而使用 BufReader 读取行意味着大量复制和分配，首先进入 BufReader 内部的缓冲区，然后为每一行放入一个新分配的String，再为每个单词放入一个新分配的String.它也将使用更少的内存，因为单个大 String 和引用向量比许多单独的 String 更紧凑.

This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.

一个缺点是你不能直接返回引用的结构，因为它不能越过持有单个大String的堆栈帧.在上面的example_use 中，我们必须将大String 放入let 中，以便调用words_by_line.可以使用不安全的代码并将 String 和引用包装在私有结构中来解决这个问题，但这要复杂得多.

A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.

这篇关于这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗? [英] Is this the right way to read lines from file and split them into words in Rust?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

这是从文件中读取行并将它们拆分为 Rust 中的单词的正确方法吗? [英] Is this the right way to read lines from file and split them into words in Rust?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭