返回依赖于函数内分配的数据的延迟迭代器 [英] Return lazy iterator that depends on data allocated within the function

查看:175
本文介绍了返回依赖于函数内分配的数据的延迟迭代器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Rust的新手并阅读 The Rust Programming Language ,以及错误处理部分有一个案例研究描述了一个阅读的程序来自CSV文件的数据使用 csv rustc-serialize 库(使用 getopts 用于参数解析)。

I am new to Rust and reading The Rust Programming Language, and in the Error Handling section there is a "case study" describing a program to read data from a CSV file using the csv and rustc-serialize libraries (using getopts for argument parsing).

作者编写一个函数 search ,它遍历行使用 csv :: Reader 对象的csv文件,并将city字段与指定值匹配的条目收集到向量中并返回它。我采取了与作者略有不同的方法,但这不应该影响我的问题。我的(工作)函数如下所示:

The author writes a function search that steps through the rows of the csv file using a csv::Reader object and collect those entries whose 'city' field match a specified value into a vector and returns it. I've taken a slightly different approach than the author, but this should not affect my question. My (working) function looks like this:

extern crate csv;
extern crate rustc_serialize;

use std::path::Path;
use std::fs::File;

fn search<P>(data_path: P, city: &str) -> Vec<DataRow>
    where P: AsRef<Path>
{
    let file = File::open(data_path).expect("Opening file failed!");
    let mut reader = csv::Reader::from_reader(file).has_headers(true);

    reader.decode()
          .map(|row| row.expect("Failed decoding row"))
          .filter(|row: &DataRow| row.city == city)
          .collect()
}

其中 DataRow 类型只是一条记录,

where the DataRow type is just a record,

#[derive(Debug, RustcDecodable)]
struct DataRow {
    country: String,
    city: String,
    accent_city: String,
    region: String,
    population: Option<u64>,
    latitude: Option<f64>,
    longitude: Option<f64>
}

现在,作者为可怕的向读者练习,修改此函数以返回迭代器而不是向量的问题(取消对 collect 的调用)。我的问题是:如何做到这一点,以及最简洁和惯用的方法是什么?

Now, the author poses, as the dreaded "exercise to the reader", the problem of modifying this function to return an iterator instead of a vector (eliminating the call to collect). My question is: How can this be done at all, and what are the most concise and idiomatic ways of doing it?

我认为类型签名正确的简单尝试是

A simple attempt that i think gets the type signature right is

fn search_iter<'a,P>(data_path: P, city: &'a str)
    -> Box<Iterator<Item=DataRow> + 'a>
    where P: AsRef<Path>
{
    let file = File::open(data_path).expect("Opening file failed!");
    let mut reader = csv::Reader::from_reader(file).has_headers(true);

    Box::new(reader.decode()
                   .map(|row| row.expect("Failed decoding row"))
                   .filter(|row: &DataRow| row.city == city))
}

我返回一个特征对象类型 Box< Iterator< Item = DataRow> +'a> 以便不必公开内部过滤器类型,以及生命周期'a 只是为了避免必须创建 city 的本地克隆。但这无法编译,因为 reader 的活动时间不够长;它在堆栈上分配,因此在函数返回时被释放。

I return a trait object of type Box<Iterator<Item=DataRow> + 'a> so as not to have to expose the internal Filter type, and where the lifetime 'a is introduced just to avoid having to make a local clone of city. But this fails to compile because reader does not live long enough; it's allocated on the stack and so is deallocated when the function returns.

我想这意味着 reader 必须从一开始就被分配在堆上(即盒装),或者在函数结束之前以某种方式从堆栈中移出。如果我返回一个闭包,这正是通过使其成为 move 闭包来解决的问题。但是当我没有返回一个函数时,我不知道如何做类似的事情。我已经尝试定义一个包含所需数据的自定义迭代器类型,但我无法使它工作,并且它变得更加丑陋和更加做作(不要过多地使用这些代码,我只是包含它显示我尝试的大致方向):

I guess this means that reader has to be allocated on the heap (i.e. boxed) from the beginning, or somehow moved off the stack before the function ends. If I were returning a closure, this is exactly the problem that would be solved by making it a move closure. But I don't know how to do something similar when I'm not returning a function. I've tried defining a custom iterator type containing the needed data, but I couldn't get it to work, and it kept getting uglier and more contrived (don't make too much of this code, I'm only including it to show the general direction of my attempts):

fn search_iter<'a,P>(data_path: P, city: &'a str)
    -> Box<Iterator<Item=DataRow> + 'a>
    where P: AsRef<Path>
{
    struct ResultIter<'a> {
        reader: csv::Reader<File>,
        wrapped_iterator: Option<Box<Iterator<Item=DataRow> + 'a>>
    }

    impl<'a> Iterator for ResultIter<'a> {
        type Item = DataRow;

        fn next(&mut self) -> Option<DataRow>
        { self.wrapped_iterator.unwrap().next() }
    }

    let file = File::open(data_path).expect("Opening file failed!");

    // Incrementally initialise
    let mut result_iter = ResultIter {
        reader: csv::Reader::from_reader(file).has_headers(true),
        wrapped_iterator: None // Uninitialised
    };
    result_iter.wrapped_iterator =
        Some(Box::new(result_iter.reader
                                 .decode()
                                 .map(|row| row.expect("Failed decoding row"))
                                 .filter(|&row: &DataRow| row.city == city)));

    Box::new(result_iter)
}

< a href =https://stackoverflow.com/questions/28774496/conflicting-lifetime-requirement-for-iterator-returned-from-function>这个问题似乎涉及同样的问题,但作者答案通过制作有关数据 static 解决了这个问题,我认为这不是这个问题的替代方案。

This question seems to concern the same problem, but the author of the answer solves it by making the concerned data static, which I don't think is an alternative for this question.

我正在使用Rust 1.10.0,Arch Linux软件包中的当前稳定版本 rust

I am using Rust 1.10.0, the current stable version from the Arch Linux package rust.

推荐答案

转换原始函数的最直接的路径是简单地包装迭代器。但是,直接这样做会导致问题,因为您无法返回引用自身的对象和<$ c的结果$ c> decode 指的是 Reader 。如果你能超越它,你不能让迭代器返回对自己的引用

The straightest path to convert the original function would be to simply wrap the iterator. However, doing so directly will lead to problems because you cannot return an object that refers to itself and the result of decode refers to the Reader. If you could surmount that, you cannot have an iterator return references to itself.

一种解决方案是简单地为每个对新迭代器的调用重新创建 DecodedRecords 迭代器:

One solution is to simply re-create the DecodedRecords iterator for each call to your new iterator:

fn search_iter<'a, P>(data_path: P, city: &'a str) -> MyIter<'a>
    where P: AsRef<Path>
{
    let file = File::open(data_path).expect("Opening file failed!");

    MyIter {
        reader: csv::Reader::from_reader(file).has_headers(true),
        city: city,
    }
}

struct MyIter<'a> {
    reader: csv::Reader<File>,
    city: &'a str,
}

impl<'a> Iterator for MyIter<'a> {
    type Item = DataRow;

    fn next(&mut self) -> Option<Self::Item> {
        let city = self.city;

        self.reader.decode()
            .map(|row| row.expect("Failed decoding row"))
            .filter(|row: &DataRow| row.city == city)
            .next()
    }
}

这可能会产生与之相关的开销,具体取决于 decode 的实现。此外,这可能会回退到输入的开头 - 如果您替换 Vec 而不是 csv :: Reader ,你会看到这个。但是,它恰好适用于这种情况。

This could have overhead associated with it, depending on the implementation of decode. Additionally, this might "rewind" back to the beginning of the input — if you substituted a Vec instead of a csv::Reader, you would see this. However, it happens to work in this case.

除此之外,我通常会打开文件并创建 csv :: Reader 在函数外部并传入 DecodedRecords 迭代器并对其进行转换,在底层迭代器周围返回一个newtype / box / type别名。我更喜欢这个,因为代码的结构反映了对象的生命周期。

Beyond that, I'd normally open the file and create the csv::Reader outside of the function and pass in the DecodedRecords iterator and transform it, returning a newtype / box / type alias around the underlying iterator. I prefer this because the structure of your code mirrors the lifetimes of the objects.

我有点惊讶的是没有 IntoIterator csv :: Reader ,这也可以解决问题,因为没有任何参考。

I'm a little surprised that there isn't an implementation of IntoIterator for csv::Reader, which would also solve the problem because there would not be any references.

这篇关于返回依赖于函数内分配的数据的延迟迭代器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆