为什么 Java List 遍历比文件 readline 慢? [英] Why is Java List traversal slower than file readline?

查看:42
本文介绍了为什么 Java List 遍历比文件 readline 慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这段代码:

while((line=br.readLine())!=null)
        {
            String Words[]= line.split(" ");
            outputLine = SomeAlgorithm(Words);
            output.write(outputLine);
        }

正如您在上面的代码中所看到的,对于输入文件中的每一行,我正在读取一行,在其上运行一些算法,基本上修改该行读取,然后将输出行写入某个文件.

As you can see in the above code, for every line in the input file I'm reading one line, running some algorithm on it which modifies that line read basically, and then writes the output line to some file.

文件有 9k 行,整个程序在我的机器上用了 3 分钟.

There are 9k lines in the file, and the entire program took 3 minutes on my machine.

我想,好吧,我正在为算法的每个(行)运行执行 2 个 I/O.所以我在做大约 18k I/O.为什么不先将所有行收集到 ArrayList 中,然后遍历列表并在每一行上运行算法?同样将每个输出收集到一个字符串变量中,然后在程序结束时将所有输出写出一次.

I thought, okay, I'm doing 2 I/Os for every (line) run of the algorithm. So I'm doing around 18k I/Os. Why not collect all the lines first into an ArrayList , then loop through the list and run the algorithm on each line? Also collect each output into one string variable, and then write out all the output once at the end of the program.

那样,整个程序总共有 2 个大 I/O(18k 小文件 I/O 到 2 个大文件 I/O).我认为这会更快,所以我写了这个:

That way, I'd have total 2 big I/Os for the entire program (18k small File I/Os to 2 big File I/Os). I thought this would be faster, so I wrote this:

List<String> lines = new ArrayList<String>();
while((line=br.readLine())!=null)
        {
            lines.add(line); // collect all lines first
        }

for (String line : lines){
    String Words[] = line.split(" ");
    bigOutput+=SomeAlgorithm(Words); // collect all output
}

output.write(bigOutput);

但是,这件事花了7分钟!!!

But, this thing took 7 minutes !!!

那么,为什么遍历 ArrayList 比逐行读取文件慢?

注意: 通过 readLine() 收集所有行并写入 bigOutput 都只需要几秒钟.SomeAlgorithm() 也没有改变.所以,当然,我认为罪魁祸首是 for (String line: lines)

Note : Collecting all lines by readLine() and writing the bigOutput are each taking only a few seconds. There is no change made to SomeAlgorithm() either. So, definitely, I think the culprit is for (String line: lines)

更新: 正如在下面的各种评论中提到的,问题不在于 ArrayList 遍历,而在于使用 += 累积输出的方式.转移到 StringBuilder() 确实给出了比原始结果更快的结果.

Update: As mentioned in the various comments below, the problem was not with ArrayList traversal , it was with the way the output was accumulated using += . Shifting to StringBuilder() did give a faster-than-original result.

推荐答案

我怀疑性能的差异是由于您如何在一个变量 (bigOutput) 中收集输出.我的猜测是,这涉及大量的内存重新分配和字符数据的复制,这是导致速度缓慢的真正原因.

I suspect the difference in performance is due to how you are collecting the output in one variable (bigOutput). My conjecture is that this involves lots of memory reallocations and copying of character data, which is the real cause of the slowness.

这篇关于为什么 Java List 遍历比文件 readline 慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆