如何使用Rust从stdin创建一个高效的char迭代器？ [英] How can I create an efficient iterator of chars from stdin with Rust?

查看：131 发布时间：2020/9/30 23:02:18 rust stdin chars

本文介绍了如何使用Rust从stdin创建一个高效的char迭代器？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

现在， Read :: chars 迭代器已被正式弃用，在像stdin这样的 Reader 来的字符上获取迭代器的正确方法是什么，而又不将整个流都读入内存？

Now that the Read::chars iterator has been officially deprecated, what is the the proper way to obtain an iterator over the chars coming from a Reader like stdin without reading the entire stream into memory?

推荐答案

正如其他人提到的那样，可以复制已弃用的 Read :: chars 以在您自己的代码中使用。对于我来说，这是否真正理想取决于您的用例-对我来说，这证明目前已经足够好了，尽管我的应用程序在不久的将来可能会超过此方法。


As a couple others have mentioned, it is possible to copy the deprecated implementation of Read::chars for use in your own code. Whether this is truly ideal or not will depend on your use-case--for me, this proved to be good enough for now although it is likely that my application will outgrow this approach in the near-future.
为说明如何完成此操作，让我们来看一个具体示例：
To illustrate how this can be done, let's look at a concrete example:
use std::io::{self, Error, ErrorKind, Read};
use std::result;
use std::str;

struct MyReader<R> {
    inner: R,
}

impl<R: Read> MyReader<R> {
    fn new(inner: R) -> MyReader<R> {
        MyReader {
            inner,
        }
    }

#[derive(Debug)]
enum MyReaderError {
    NotUtf8,
    Other(Error),
}

impl<R: Read> Iterator for MyReader<R> {
    type Item = result::Result<char, MyReaderError>;

    fn next(&mut self) -> Option<result::Result<char, MyReaderError>> {
        let first_byte = match read_one_byte(&mut self.inner)? {
            Ok(b) => b,
            Err(e) => return Some(Err(MyReaderError::Other(e))),
        };
        let width = utf8_char_width(first_byte);
        if width == 1 {
            return Some(Ok(first_byte as char));
        }
        if width == 0 {
            return Some(Err(MyReaderError::NotUtf8));
        }
        let mut buf = [first_byte, 0, 0, 0];
        {
            let mut start = 1;
            while start < width {
                match self.inner.read(&mut buf[start..width]) {
                    Ok(0) => return Some(Err(MyReaderError::NotUtf8)),
                    Ok(n) => start += n,
                    Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
                    Err(e) => return Some(Err(MyReaderError::Other(e))),
                }
            }
        }
        Some(match str::from_utf8(&buf[..width]).ok() {
            Some(s) => Ok(s.chars().next().unwrap());
            None => Err(MyReaderError::NotUtf8),
        })
    }
}

以上代码也需要 read_one_byte 和 utf8_char_width 待实施。这些应该类似于：
The above code also requires read_one_byte and utf8_char_width to be implemented. Those should look something like:
static UTF8_CHAR_WIDTH: [u8; 256] = [
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x1F
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x3F
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x5F
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // 0x7F
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0x9F
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 0xBF
0,0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, // 0xDF
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3, // 0xEF
4,4,4,4,4,0,0,0,0,0,0,0,0,0,0,0, // 0xFF
];

fn utf8_char_width(b: u8) -> usize {
    return UTF8_CHAR_WIDTH[b as usize] as usize;
}

fn read_one_byte(reader: &mut Read) -> Option<io::Result<u8>> {
    let mut buf = [0];
    loop {
        return match reader.read(&mut buf) {
            Ok(0) => None,
            Ok(..) => Some(Ok(buf[0])),
            Err(ref e) if e.kind() == ErrorKind::Interrupted => continue,
            Err(e) => Some(Err(e)),
        };
    }
}

现在我们可以使用 MyReader 实现可在某些阅读器上生成 char  s的迭代器，例如  io :: stdin :: Stdin  ：
Now we can use the MyReader implementation to produce an iterator of chars over some reader, like io::stdin::Stdin:
fn main() {
    let stdin = io::stdin();
    let mut reader = MyReader::new(stdin.lock());
    for c in reader {
        println!("{}", c);
    }
}

在原始问题线程。 一个值得关注的问题但值得指出的是，该迭代器将无法正确处理非UTF-8编码的流。
The limitations of this approach are discussed at length in the original issue thread. One particular concern worth pointing out however is that this iterator will not handle non-UTF-8 encoded streams correctly.

                        这篇关于如何使用Rust从stdin创建一个高效的char迭代器？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何使用Rust从stdin创建一个高效的char迭代器？ [英] How can I create an efficient iterator of chars from stdin with Rust?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Rust从stdin创建一个高效的char迭代器？ [英] How can I create an efficient iterator of chars from stdin with Rust?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭