比Scanner或BufferedReader从STDIN读取多行数据更快的方法? [英] Faster way than Scanner or BufferedReader reading multiline data from STDIN?

查看:190
本文介绍了比Scanner或BufferedReader从STDIN读取多行数据更快的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:我目前正在用java编码。我希望将输入数据读入一个字符串,一次一行(或更多),我希望总行数很多。

Note: I am currently coding in java. I am looking to read input data into a string, one line at a time (or more), and I expect a lot of total lines.

现在我已经实现了

scanner in = new Scanner(System.in)
while (in.hasNextLine()) {
    separated = in.nextLine().split(" ");
    ...
}

因为在行内我的输入是空格分隔的。

because within the line my inputs are space delimited.

不幸的是,由于数百万行,这个过程非常慢,而且扫描程序占用的时间比我的数据处理时间长,所以我查看了java.io库和发现了一堆可能性,我不确定使用哪一种( ByteArrayInputStream FileInputStream BufferedInputStream PipedInputStream )。我应该使用哪一个?

Unfortunately, with millions of lines this process is VERY slow and he scanner is taking up more time than my data processing, so I looked into the java.io libraries and found a bunch of possibilities and I'm not sure which one to use (ByteArrayInputStream, FileInputStream, BufferedInputStream, PipedInputStream). Which one should I use?

要指定,我的数据是从文本文件传入的,每行有4或6个字以换行符结尾,并且我需要一次分析一行,将(4或6)个单词设置为我可以临时管理的数组。
数据格式:

To specify, my data is being piped in from a text file, every line has either 4 or 6 words ended by a newline character, and I need to analyze one line at a time, setting the (4 or 6) words to an array which I can temporarily manage. Data format:

392903840 a c b 293 32.90
382049804 a c 390
329084203 d e r 489 384.90
...

有没有办法扫描仪可以读取1000个左右的行时间和效率或我应该使用哪些数据类型(以最小化速度)?

Is there a way where scanner can read 1000 or so lines at a time and become efficient or which of these datatypes should I use(to minimize speed)?

旁注:试验时我尝试过:

Sidenote: while experimenting I have tried:

java.io.BufferedReader stdin = new java.io.BufferedReader(new java.io.InputStreamReader(System.in));
while(in.ready()){
    separated = in.readLine().split(" ");
    ...
}

哪个效果很好,只是想知道哪一个有效最好的,如果有任何方法,比如说,一次读取100行数据,然后处理所有内容。寻找最佳解决方案的选项太多。

Which worked well, just wondering which one works best, and if there's any way to, say, read 100 lines into data at once then process everything. Too many options looking for the optimal solution.

推荐答案

你应该包装你的 System.in 使用 BufferInputStream 喜欢:

You should wrap your System.in with a BufferInputStream like:

BufferedInputStream bis = new BufferedInputStream(System.in);
Scanner in = new Scanner(bis);

因为这可以最大限度地减少对System.in的读取量,从而提高效率(BufferedInputStream)。

because this minimises the amount of reads to System.in which raises efficiency (the BufferedInputStream).

另外,如果你只是读行,你真的不需要扫描仪,而是读者(它有 readLine() ready()获取新行并查看是否还有其他数据要读取的方法。

Also, if you're only reading lines, you don't really need a Scanner, but a Reader (which has readLine() and ready() methods to get a new line and see if there's any more data to be read).

您可以这样使用它(参见 java6:InputStreamReader ):

You would use it as such (see example at java6 : InputStreamReader):

(我在 BufferedReader 中添加了32MB的缓存大小参数)

(I added a cache size argument of 32MB to BufferedReader)

BufferedReader br = new BufferedReader(new InputStreamReader(System.in), 32*1024*1024);
while (br.ready()) {
    String line = br.readLine();
    // process line
}

来自InputStreamReader文档页面:

From the InputStreamReader doc page:


如果没有缓冲,每次调用
read()或readLine()都可能导致从文件中读取字节
,将
转换为字符,然后返回,
,效率非常低。

Without buffering, each invocation of read() or readLine() could cause bytes to be read from the file, converted into characters, and then returned, which can be very inefficient.

这篇关于比Scanner或BufferedReader从STDIN读取多行数据更快的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆