读取输入流并基于分隔符进行拆分 [英] Reading Input Stream and splitting based on a delimiter

查看:440
本文介绍了读取输入流并基于分隔符进行拆分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个场景,我将获得一个大数据作为输入流,它将有一个分隔符并拆分并处理它们。如果可能的话,我想完全在内存中进行处理。现在我在扫描仪的帮助下实现了这一点,如下所示,代码如下:

I have a scenario where I will get a large data as an input stream, which is going to have a delimiter and split it and process them. I want to process , this completely in memory , if its possible. Right now I am achieving this with the help of scanner as shown below , in the code:

package chap5_questions;

import java.util.Scanner;

public class paintjob_chp5 {

    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileNotFoundException;

    public class ScannerTest {
        public static void main(String[] args) {
            FileInputStream fin = null;
            try {
                fin = new FileInputStream(new File("E:\\Project\\Journalling\\docs\\readFile.txt"));

            } catch (FileNotFoundException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
            java.util.Scanner scanner = new java.util.Scanner(fin, "UTF-8").useDelimiter("--AABBCCDDEEFFGGHHIIaabbccdd");
            String theString = null;

            while (scanner.hasNext()) {
                theString = scanner.next();
                System.out.println(theString);
                functionToProcessStreams(theString); // This will actually do the processing.

            }

            scanner.close();
        }
    }
}

但是,我不确定,如果这是最有效的方法。想到的另一件事是在输入流上使用 read(b,off,len)函数,然后处理每个bytearray。但是,为此,我需要知道分隔符的索引,它可能再次读取整个流。

However, I am not sure, if this is the most efficient way to do this. Another thing that comes to mind, is to use the read(b, off, len) function on inputstream, and then process each of the bytearray. However, for this I need to know , the index of the delimiters , which might again be reading the entire stream.

请建议是否有更好的方法。

Please, suggest if there is any better way to do this.

推荐答案

使用扫描器 useDelimiter()是高效的:它使用(构造的)正则表达式并将读取你的只输入一次。

Using Scanner with useDelimiter() is efficient: it uses a (constructed) regular expression and will read your input only once.

在旁注:即使它会花费一点效率,使用清晰的代码总是一个好主意。这样可以让您更快地调整代码,减少错误。过早的最优化是所有邪恶的根源。

On a side note: Even if it would cost a bit of efficiency, it is always a good idea to use legible code. This will allow you adapt your code faster and you will make less mistakes. Premature optimalization is the root of all evil.

这篇关于读取输入流并基于分隔符进行拆分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆