如何逐行读取文件,消除重复,然后写回同一文件? [英] How can I read a file line-by-line, eliminate duplicates, then write back to the same file?

查看:326
本文介绍了如何逐行读取文件,消除重复,然后写回同一文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想读取一个文件,消除所有重复项,然后将其余部分写回到该文件中-就像重复清理器一样. Vec,因为普通数组的大小是固定的,但是我的.txt是灵活的(我这样做对吗?).

I want to read a file, eliminate all duplicates and write the rest back into the file - like a duplicate cleaner. Vec because a normal array has a fixed size but my .txt is flexible (am I doing this right?).

读取,Vec中的行+删除重复项: 缺少写回文件的操作.

Read, lines in Vec + delete duplices: Missing write back to file.

use std::io;

fn main() {
    let path = Path::new("test.txt");
    let mut file = io::BufferedReader::new(io::File::open(&path, R));

    let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
    // dedup() deletes all duplicates if sort() before
    lines.sort();
    lines.dedup();

    for e in lines.iter() {
        print!("{}", e.as_slice());
    }
}

读取+写入文件(未经测试,但我认为应该可以工作). Vec缺少行,因为在没有BufferedReader的情况下它无法工作(或者我做错了其他东西,这也是一个很好的机会).

Read + write to file (untested but should work I guess). Missing lines to Vec because it doesn't work without BufferedReader as it seems (or I'm doing something else wrong, also a good chance).

use std::io;

fn main() {
    let path = Path::new("test.txt");
    let mut file = match io::File::open_mode(&path, io::Open, io::ReadWrite) {
        Ok(f) => f,
        Err(e) => panic!("file error: {}", e),
    };  
    let mut lines: Vec<String> = file.lines().map(|x| x.unwrap()).collect();
    lines.sort();
    // dedup() deletes all duplicates if sort() before
    lines.dedup();

    for e in lines.iter() {
        file.write("{}", e);
    }
} 

那么....我怎么把那两个放在一起呢? :)

So .... how do I get those 2 together? :)

推荐答案

最终,您将遇到一个问题:您试图写入要读取的文件.在 this 情况下,这是安全的,因为您将要读取整个文件,因此在此之后就不需要它了.但是,如果您 did 尝试写入文件,则会发现打开文件进行读取不允许写入!这是执行此操作的代码:

Ultimately, you are going to run into a problem: you are trying to write to the same file you are reading from. In this case, it's safe because you are going to read the entire file, so you don't need it after that. However, if you did try to write to the file, you'd see that opening a file for reading doesn't allow writing! Here's the code to do that:

use std::{
    fs::File,
    io::{BufRead, BufReader, Write},
};

fn main() {
    let mut file = File::open("test.txt").expect("file error");
    let reader = BufReader::new(&mut file);

    let mut lines: Vec<_> = reader
        .lines()
        .map(|l| l.expect("Couldn't read a line"))
        .collect();

    lines.sort();
    lines.dedup();

    for line in lines {
        file.write_all(line.as_bytes())
            .expect("Couldn't write to file");
    }
}

这是输出:

% cat test.txt
    a
    a
    b
    a
                                                                                                                                                                                                                                     % cargo run
thread 'main' panicked at 'Couldn't write to file: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/main.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

您可以打开文件进行读写:

You could open the file for both reading and writing:

use std::{
    fs::OpenOptions,
    io::{BufRead, BufReader, Write},
};

fn main() {
    let mut file = OpenOptions::new()
        .read(true)
        .write(true)
        .open("test.txt")
        .expect("file error");

    // Remaining code unchanged
}

但是随后您会看到(a)输出是附加的 和(b)所有换行都丢失在新行上,因为BufRead不包括它们.

But then you'd see that (a) the output is appended and (b) all the newlines are lost on the new lines because BufRead doesn't include them.

我们可以将文件指针重设回开头,但是您可能会在结尾处留下结尾的内容(重复数据删除的字节数可能少于读取的字节数).重新打开文件进行写入会更容易,它将截断文件.另外,让我们使用一组数据结构为我们执行重复数据删除!

We could reset the file pointer back to the beginning, but then you'd probably leave trailing stuff at the end (deduplicating is likely to have less bytes written than read). It's easier to just reopen the file for writing, which will truncate the file. Also, let's use a set data structure to do the deduplication for us!

use std::{
    collections::BTreeSet,
    fs::File,
    io::{BufRead, BufReader, Write},
};

fn main() {
    let file = File::open("test.txt").expect("file error");
    let reader = BufReader::new(file);

    let lines: BTreeSet<_> = reader
        .lines()
        .map(|l| l.expect("Couldn't read a line"))
        .collect();

    let mut file = File::create("test.txt").expect("file error");

    for line in lines {
        file.write_all(line.as_bytes())
            .expect("Couldn't write to file");

        file.write_all(b"\n").expect("Couldn't write to file");
    }
}

输出:

% cat test.txt
a
a
b
a
a
b
a
b

% cargo run
% cat test.txt
a
b

效率较低但较短的解决方案是将整个文件读取为一个字符串并使用str::lines:

The less-efficient but shorter solution is to read the entire file as one string and use str::lines:

use std::{
    collections::BTreeSet,
    fs::{self, File},
    io::Write,
};

fn main() {
    let contents = fs::read_to_string("test.txt").expect("can't read");
    let lines: BTreeSet<_> = contents.lines().collect();

    let mut file = File::open("test.txt").expect("can't create");
    for line in lines {
        writeln!(file, "{}", line).expect("can't write");
    }
}

另请参阅:

  • What's the de-facto way of reading and writing files in Rust 1.x?
  • What is the best variant for appending a new line in a text file?

这篇关于如何逐行读取文件,消除重复,然后写回同一文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆