在文本文件中求和整数的最快方法 [英] Fastest way to sum integers in text file

查看:21
本文介绍了在文本文件中求和整数的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个大的 ASCII 文本文件,每行都有一个随机的非负整数,每行的范围从 0 到 1,000,000,000.文件中有 100,000,000 行.阅读文件并计算所有整数之和的最快方法是什么?

Suppose you have a large ASCII text file, with a random non-negative integer on each line, each in the range from 0 to 1,000,000,000. There are 100,000,000 lines in the file. What's the fastest way to read through the file and calculate the sum of all the integers?

约束:我们有 10MB 的 RAM 可以使用.该文件大小为 1GB,因此我们不想读取整个文件然后对其进行处理.

Constraint: we've got 10MB of RAM to work with. The file is 1GB in size, so we don't want to read the whole thing in and then process it.

以下是我尝试过的各种解决方案.我发现结果相当令人惊讶.

Here are various solutions I've tried. I found the results rather surprising.

有没有我错过的更快的东西?

Is there anything faster that I've missed?

请注意:下面给出的所有时间都用于运行算法10次(运行一次并丢弃;启动计时器;运行 10 次;停止计时器).机器是相当慢的 Core 2 Duo.

Please note: all timings given below are for running the algorithm 10 times in total (run once and discard; start timer; run 10 times; stop timer). The machine is a fairly slow Core 2 Duo.

首先要尝试的是显而易见的方法:

The first thing to try is the obvious approach:

private long sumLineByLine() throws NumberFormatException, IOException {
    BufferedReader br = new BufferedReader(new FileReader(file));
    String line;
    long total = 0;
    while ((line = br.readLine()) != null) {
        int k = Integer.parseInt(line);
        total += k;
    }
    br.close();
    return total;
}

注意最大可能的返回值是 10^17,这仍然很容易装在 long 中,所以我们不必担心溢出.

Note that the maximum possible return value is 10^17, which still easily fits in a long, so we don't have to worry about overflows.

在我的机器上,运行 11 次并扣除第一次运行大约需要 92.9 秒.

On my machine, running this 11 times and discounting the first run takes around 92.9 seconds.

受到对这个问题的评论的启发,我尝试不创建new int k 用于存储解析该行的结果,而只是将解析的值直接添加到 total 中.所以这个:

Inspired by a comment on this question, I tried not creating a new int k to store the result of parsing the line, and instead just to add the parsed value directly to total. So this:

    while ((line = br.readLine()) != null) {
        int k = Integer.parseInt(line);
        total += k;
    }

变成这样:

    while ((line = br.readLine()) != null)
        total += Integer.parseInt(line);

我确信这不会产生任何区别,并且认为编译器很可能会为两个版本生成相同的字节码.但是,令我惊讶的是,它确实缩短了一点时间:我们缩短到 92.1 秒.

I was certain that this wouldn't make any difference, and thought it highly likely that the compiler would generate the same bytecode for the two versions. But, to my surprise, it did shave a little time off: we're down to 92.1 seconds.

到目前为止,代码中让我困扰的一件事是我们将 String 转换为 int,然后在最后添加它.在我们进行时添加会不会更快?如果我们自己解析 String 会发生什么?像这样……

One thing that bothers me about the code so far is that we turn the String into an int, and then add it on at the end. Might it not be quicker to add on as we go? What happens if we parse the String ourselves? Something like this...

private long sumLineByLineManualParse() throws NumberFormatException,
        IOException {
    BufferedReader br = new BufferedReader(new FileReader(file));
    String line;
    long total = 0;
    while ((line = br.readLine()) != null) {
        char chs[] = line.toCharArray();
        int mul = 1;
        for (int i = chs.length - 1; i >= 0; i--) {
            char c = chs[i];
            switch (c) {
            case '0':
                break;
            case '1':
                total += mul;
                break;
            case '2':
                total += (mul << 1);
                break;
            case '4':
                total += (mul << 2);
                break;
            case '8':
                total += (mul << 3);
                break;
            default:
                total += (mul*((byte) c - (byte) ('0')));   
            }
            mul*=10;
        }
    }
    br.close();
    return total;
}

我认为,这可能会节省一点时间,尤其是对进行乘法的一些位移优化.但是转换为字符数组的开销必须淹没任何收益:现在需要 148.2 秒.

This, I thought, might save a little time, especially with some bitshift optimisations for doing the multiplication. But the overheads of converting to a character array must swamp any gains: this now takes 148.2 seconds.

我们可以尝试的最后一件事是将文件作为二进制数据进行处理.

One last thing we can try is to process the file as binary data.

如果你不知道它的长度,从前面解析一个整数是很尴尬的.向后解析要容易得多:您遇到的第一个数字是单位,下一个是十,依此类推.因此,解决整个问题的最简单方法是向后读取文件.

Parsing an integer from the front is awkward if you don't know the length of it. Parsing it backwards is much easier: the first digit you encounter is units, the next one is tens, and so on. So the easiest way to approach the whole thing is to read the file backwards.

如果我们分配一个 byte[] 缓冲区(比如)8MB,我们可以用文件的最后 8MB 填充它,处理它,然后读取前面的 8MB,依此类推.我们需要小心一点,不要在移动到下一个块时弄乱我们正在解析的数字,但这是唯一的问题.

If we allocate a byte[] buffer of (say) 8MB, we can fill it up with the last 8MB of the file, process it, then read the preceding 8MB, and so on. We need to be a little careful that we don't screw up a number that we're in the middle of parsing when we move to the next block, but that's the only problem.

当我们遇到一个数字时,我们将它(根据它在数字中的位置适当地相乘)加到总数中,然后将系数乘以 10,这样我们就可以为下一个数字做好准备了.如果我们遇到任何不是数字的东西(CR 或 LF),我们只需重置系数.

When we encounter a digit, we add it (suitably multiplied according to its position in the numeral) to the total, and then multiply the coefficient by 10 so we're ready for the next digit. If we encounter anything that isn't a digit (a CR or LF), we just reset the coefficient.

private long sumBinary() throws IOException {
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    int lastRead = (int) raf.length();
    byte buf[] = new byte[8*1024*1024];
    int mul = 1;
    long total = 0;
    while (lastRead>0) {
        int len = Math.min(buf.length, lastRead);
        raf.seek(lastRead-len);
        raf.readFully(buf, 0, len);
        lastRead-=len;
        for (int i=len-1; i>=0; i--) {
            //48 is '0' and 57 is '9'
            if ((buf[i]>=48) && (buf[i]<=57)) {
                total+=mul*(buf[i]-48);
                mul*=10;
            } else
                mul=1;
        }
    }
    raf.close();
    return total;
}

这会在 30.8 秒内运行!与之前的最佳速度相比,速度提高了 3 倍.

This runs in 30.8 seconds! That's a speed increase by a factor of 3 over the previous best.

  1. 为什么这么快?我期待它获胜,但并没有那么令人印象深刻.主要是转换为 String 的开销吗?以及有关字符集等的所有幕后担忧?
  2. 我们能否通过使用 MappedByteBuffer 来提供比这更好的帮助?我有一种感觉,调用从缓冲区读取的方法的开销会减慢速度,尤其是从缓冲区向后读取时.
  3. 向前读取文件而不是向后读取文件,但仍然向后扫描缓冲区会更好吗?这个想法是你读取文件的第一个块,然后向后扫描,但最后丢弃半数.然后,当您读取下一个块时,您设置偏移量,以便从您丢弃的数字的开头读取.
  4. 有什么我没有想到会产生重大影响的吗?
  1. Why is this so much faster? I was expecting it to win, but not quite so impressively. Is it mainly the overheads of converting to a String? And all the worrying behind the scenes about character sets and the like?
  2. Can we do any better than this by using a MappedByteBuffer to help? I have a feeling that the overheads of invoking methods to read from the buffer would slow things down, especially when reading backwards from the buffer.
  3. Would it be better to read the file forwards rather than backwards, but still scan the buffer backwards? The idea would be that you read the first chunk of the file, and then scan backwards, but discarding the half-number at the end. Then when you read the next chunk, you set the offset so that you read from the beginning of the number you discarded.
  4. Is there anything I haven't thought of that could make a significant difference?

更新:更令人惊讶的结果

首先,观察.我以前应该想到过,但我认为基于 String 的读取效率低下的原因不是创建所有 String 对象所花费的时间,而是事实上,它们是如此短暂:我们有 100,000,000 个它们供垃圾收集器处理.那肯定会让它不高兴.

Update: more surprising results

First, an observation. It should have occurred to me before, but I think the reason for the inefficiency of the String-based reading is not so much the time taken to create all the String objects but the fact that they are so short-lived: we've got 100,000,000 of them for the garbage collector to deal with. That is bound to upset it.

现在根据人们发布的答案/评论进行一些实验.

Now some experiments based on answers/comments people have posted.

一个建议是,由于 BufferedReader 使用 16KB 的默认缓冲区,而我使用了 8MB 的缓冲区,因此我不会比较 like 和 like.如果您使用更大的缓冲区,它肯定会更快.

One suggestion was that since a BufferedReader uses a default buffer of 16KB, and I've used a buffer of 8MB, I'm not comparing like with like. It's bound to be faster if you use a bigger buffer.

这是震惊.sumBinary() 方法(方法 4)昨天在 30.8 秒内运行,缓冲区为 8MB.今天,代码不变,风向变了,我们在 30.4 秒.如果我将缓冲区大小降低到 16KB 以查看它变慢了多少,它会变得更快!它现在可以在 23.7 秒中运行.疯狂的.谁看到那个来了?!

Here's the shock. The sumBinary() method (Method 4) ran in 30.8 seconds yesterday with an 8MB buffer. Today, code unchanged, the wind direction has changed and we're at 30.4 seconds. If I drop the buffer size down to 16KB to see how much slower it gets, it gets faster! It now runs in 23.7 seconds. Crazy. Who saw that one coming?!

一些实验表明 16KB 大约是最佳的.也许 Java 人也做过同样的实验,这就是他们选择 16KB 的原因!

A bit of experimentation suggests that 16KB is about optimal. Perhaps the Java guys did the same experiments, and that's why they went with 16KB!

我也想知道这个.在磁盘访问上花费了多少时间,在数字运算上花费了多少时间?如果几乎所有的磁盘访问都如对建议的答案之一得到充分支持的评论所建议的那样,那么无论我们做什么,我们都无法做出太大的改进.

I wondered about this too. How much time is spent on disk access, and how much on number crunching? If it's almost all disk access, as suggested by a well-supported comment on one of the proposed answers, then we won't be able to make much improvement whatever we do.

通过在注释掉所有解析和数字运算的情况下运行代码,但读取仍然完好无损,这很容易测试:

This is easy to test by running the code with all the parsing and number crunching commented out, but with the reading still intact:

private long sumBinary() throws IOException {
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    int lastRead = (int) raf.length();
    byte buf[] = new byte[16 * 1024];
    int mul = 1;
    long total = 0;
    while (lastRead > 0) {
        int len = Math.min(buf.length, lastRead);
        raf.seek(lastRead - len);
        raf.readFully(buf, 0, len);
        lastRead -= len;
        /*for (int i = len - 1; i >= 0; i--) {
            if ((buf[i] >= 48) && (buf[i] <= 57)) {
                total += mul * (buf[i] - 48);
                mul *= 10;
            } else
                mul = 1;
        }*/
    }
    raf.close();
    return total;
}

现在运行 3.7 秒!对我来说,这看起来与 I/O 无关.

This now runs in 3.7 seconds! This doesn't look I/O-bound to me.

当然,部分 I/O 速度将来自磁盘缓存命中.但这并不是真正的重点:我们仍然占用了 20 秒的 CPU 时间(也使用 Linux 的 time 命令确认),这足以尝试减少它.

Of course, some of the I/O speed will come from disk cache hits. But that isn't really the point here: we're still taking 20 seconds of CPU time (also confirmed using Linux's time command), which is plenty big enough to try to reduce it.

我在我的原始帖子中坚持认为有充分的理由向后而不是向前扫描文件.我没有很好地解释这一点.这个想法是,如果你向前扫描一个数字,你必须累加扫描数字的总值,然后把它加起来.如果向后扫描,则可以随时将其添加到累积总数中.我的潜意识正在对自己产生某种意义(稍后会详细说明),但我错过了一个关键点,这是在其中一个答案中指出的:要向后扫描,我每次迭代都进行两次乘法,但是使用向前扫描你只需要一个.所以我编写了一个前向扫描版本:

I'd maintained in my original post that there was good reason to scan the file backwards rather than forwards. I didn't explain that very well. The idea was that if you scan a number forwards, you have to accumulate the total value of the scanned number, and then add it on. If you scan backwards, you can add it to the cumulative total as you go. My subconscious was making some sort of sense to itself (on which more later), but I'd missed one key point, which was pointed out in one of the answers: to scan backwards, I was doing two multiplications per iteration, but with scanning forwards you need only one. So I coded up a forward-scanning version:

private long sumBinaryForward() throws IOException {
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    int fileLength = (int) raf.length();
    byte buf[] = new byte[16 * 1024];
    int acc = 0;
    long total = 0;
    int read = 0;
    while (read < fileLength) {
        int len = Math.min(buf.length, fileLength - read);
        raf.readFully(buf, 0, len);
        read += len;
        for (int i = 0; i < len; i++) {
            if ((buf[i] >= 48) && (buf[i] <= 57))
                acc = acc * 10 + buf[i] - 48;
            else {
                total += acc;
                acc = 0;
            }
        }
    }
    raf.close();
    return total;
}

这会在 20.0 秒内运行,比反向扫描版本快了一段距离.不错.

This runs in 20.0 seconds, beating the backward-scanning version by a distance. Nice.

我在晚上意识到,虽然我每次迭代执行两次乘法,但有可能使用缓存来存储这些乘法,这样我就可以避免在向后迭代期间执行它们.当我醒来时看到有人有同样的想法,我很高兴!

What I realised during the night, though, was that although I was performing two multiplications per iteration, there was the possibility of using a cache to store these multiplications, so that I could avoid having to perform them during backwards iteration. I was pleased to see when I woke up that someone had had the same idea!

关键是我们正在扫描的数字中最多有 10 位数字,并且只有 10 位可能的数字,因此一位数字的值占累计总数的可能性只有 100 种.我们可以预先计算这些,然后在向后扫描代码中使用它们.这应该优于前向扫描版本,因为我们现在已经完全摆脱了乘法.(请注意,我们不能通过前向扫描来做到这一点,因为乘法是累加器的,它可以取任何值高达 10^9.只有在向后的情况下,两个操作数都被限制为几种可能性.)

The point is that there are at most 10 digits in the numbers we're scanning, and only 10 possible digits, so only 100 possibilities for the value of a digit to the cumulative total. We can precompute these, and then use them in the backward-scanning code. That ought to beat the forward-scanning version, because we've now got rid of the multiplications entirely. (Note that we can't do this with forward scanning, because the multiplication is of the accumulator, which could take any value up to 10^9. It's only in the backward case that both operands are limited to a few possibilities.)

private long sumBinaryCached() throws IOException {
    int mulCache[][] = new int[10][10];
    int coeff = 1;
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 10; j++)
            mulCache[i][j] = coeff * j;
        coeff *= 10;
    }

    RandomAccessFile raf = new RandomAccessFile(file, "r");
    int lastRead = (int) raf.length();
    byte buf[] = new byte[16 * 1024];
    int mul = 0;
    long total = 0;
    while (lastRead > 0) {
        int len = Math.min(buf.length, lastRead);
        raf.seek(lastRead - len);
        raf.readFully(buf, 0, len);
        lastRead -= len;
        for (int i = len - 1; i >= 0; i--) {
            if ((buf[i] >= 48) && (buf[i] <= 57))
                total += mulCache[mul++][buf[i] - 48];
            else
                mul = 0;
        }
    }
    raf.close();
    return total;
}

这会在 26.1 秒内运行.令人失望,至少可以说.向后读取在 I/O 方面效率较低,但我们已经看到 I/O 并不是这里的主要问题.我原以为这会产生很大的积极影响.也许数组查找与我们替换的乘法一样昂贵.(我确实尝试将数组设为 16x16,并使用位移来索引,但没有帮助.)

This runs in 26.1 seconds. Disappointing, to say the least. Reading backwards is less efficient in terms of I/O, but we've seen that I/O is not the major headache here. I had expected this to make a big positive difference. Perhaps the array lookup is just as expensive as the multiplications we've replaced. (I did try making the array 16x16, and using bitshifts to index, but it didn't help.)

看起来像是向前扫描.

接下来要添加的是 MappedByteBuffer,看看它是否比使用原始 RandomAccessFile 更有效.它不需要对代码进行太多更改.

Next thing to add in is a MappedByteBuffer, to see if that's more efficient than using a raw RandomAccessFile. It doesn't need much change to the code.

private long sumBinaryForwardMap() throws IOException {
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    byte buf[] = new byte[16 * 1024];
    final FileChannel ch = raf.getChannel();
    int fileLength = (int) ch.size();
    final MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0,
            fileLength);
    int acc = 0;
    long total = 0;
    while (mb.hasRemaining()) {
        int len = Math.min(mb.remaining(), buf.length);
        mb.get(buf, 0, len);
        for (int i = 0; i < len; i++)
            if ((buf[i] >= 48) && (buf[i] <= 57))
                acc = acc * 10 + buf[i] - 48;
            else {
                total += acc;
                acc = 0;
            }
    }
    ch.close();
    raf.close();
    return total;
}

这似乎确实有所改善:我们现在处于19.0 秒.我们的个人最好成绩又慢了一秒!

This does seem to improve things a little: we're now at 19.0 seconds. We've taken another second off our personal best!

建议的答案之一涉及使用多个内核.我有点惭愧,我没有想到这一点!

One of the proposed answers involves using multiple cores. I'm a little ashamed that that hadn't occurred to me!

答案很简单,因为假设这是一个 I/O 限制的问题.鉴于 I/O 的结果,这似乎有点苛刻!无论如何,当然值得一试.

The answer came in for some stick, because of the assumption that it's an I/O-bound problem. This seems a little harsh, in light of the results about I/O! Certainly worth a try, in any case.

我们将使用 fork/join 来做到这一点.这是一个表示对文件的一部分进行计算的结果的类,请记住,左侧可能有部分结果(如果我们从数字的一半开始),右侧可能有部分结果(如果缓冲区完成了一个数字的一​​半).该类还有一个方法可以让我们将两个这样的结果粘合在一起,成为两个相邻子任务的组合结果.

We'll do this using fork/join. Here's a class to represent the result of a computation on part of the file, bearing in mind that there might be a partial result to the left (if we started half way through a number), and a partial result to the right (if the buffer finished half way through a number). The class also has a method for allowing us to glue two such results together, into a combined result for two adjacent sub-tasks.

private class SumTaskResult {
    long subtotal;
    int leftPartial;
    int leftMulCount;
    int rightPartial;

    public void append(SumTaskResult rightward) {
        subtotal += rightward.subtotal + rightPartial
                * rightward.leftMulCount + rightward.leftPartial;
        rightPartial = rightward.rightPartial;
    }
}

现在是关键位:计算结果的 RecursiveTask.对于小问题(少于64个字符),在单线程中调用computeDirectly()计算结果;对于较大的问题,它分成两部分,在单独的线程中解决两个子问题,然后合并结果.

Now the key bit: the RecursiveTask that computes the result. For small problems (less than 64 characters), it calls computeDirectly() to calculate the result in a single thread; for larger problems, it splits into two, solves the two sub-problems in separate threads, and then combines the results.

private class SumForkTask extends RecursiveTask<SumTaskResult> {

    private byte buf[];
    // startPos inclusive, endPos exclusive
    private int startPos;
    private int endPos;

    public SumForkTask(byte buf[], int startPos, int endPos) {
        this.buf = buf;
        this.startPos = startPos;
        this.endPos = endPos;
    }

    private SumTaskResult computeDirectly() {
        SumTaskResult result = new SumTaskResult();
        int pos = startPos;

        result.leftMulCount = 1;

        while ((buf[pos] >= 48) && (buf[pos] <= 57)) {
            result.leftPartial = result.leftPartial * 10 + buf[pos] - 48;
            result.leftMulCount *= 10;
            pos++;
        }

        int acc = 0;
        for (int i = pos; i < endPos; i++)
            if ((buf[i] >= 48) && (buf[i] <= 57))
                acc = acc * 10 + buf[i] - 48;
            else {
                result.subtotal += acc;
                acc = 0;
            }

        result.rightPartial = acc;
        return result;
    }

    @Override
    protected SumTaskResult compute() {
        if (endPos - startPos < 64)
            return computeDirectly();
        int mid = (endPos + startPos) / 2;
        SumForkTask left = new SumForkTask(buf, startPos, mid);
        left.fork();
        SumForkTask right = new SumForkTask(buf, mid, endPos);
        SumTaskResult rRes = right.compute();
        SumTaskResult lRes = left.join();
        lRes.append(rRes);
        return lRes;
    }

}

请注意,这是对 byte[] 的操作,而不是整个 MappedByteBuffer.这样做的原因是我们希望保持磁盘访问顺序.我们将取相当大的块,fork/join,然后移动到下一个块.

Note that this is operating on a byte[], rather than the whole MappedByteBuffer. The reason for that is that we want to keep the disk access sequential. We'll take quite large chunks, fork/join, and then move to the next chunk.

这是这样做的方法.请注意,我们已将缓冲区大小推高至 1MB(之前未达到最佳状态,但似乎在这里更合理).

Here's the method that does that. Note that we've pushed the buffer size up to 1MB (sub-optimal earlier, but more sensible here, it seems).

private long sumBinaryForwardMapForked() throws IOException {
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    ForkJoinPool pool = new ForkJoinPool();

    byte buf[] = new byte[1 * 1024 * 1024];
    final FileChannel ch = raf.getChannel();
    int fileLength = (int) ch.size();
    final MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0,
            fileLength);
    SumTaskResult result = new SumTaskResult();
    while (mb.hasRemaining()) {
        int len = Math.min(mb.remaining(), buf.length);
        mb.get(buf, 0, len);
        SumForkTask task = new SumForkTask(buf, 0, len);
        result.append(pool.invoke(task));
    }
    ch.close();
    raf.close();
    pool.shutdown();
    return result.subtotal;
}

现在令人心碎的失望是:这个漂亮的多线程代码现在需要32.2 秒.为何这么慢?我花了很长时间调试这个,假设我做错了什么.

Now here's the soul-destroying disappointment: this nicely multi-threaded code now takes 32.2 seconds. Why so slow? I spent quite a while debugging this, assuming I'd done something terribly wrong.

原来只需要一个小的调整.我认为小问题和大问题之间的阈值 64 是合理的;事实证明这完全是荒谬的.

Turns out there was just one small tweak needed. I'd thought the threshold of 64 between small problem and big problem was a reasonable one; turns out that was totally ridiculous.

这样想.子问题的大小完全相同,因此它们应该在几乎相同的时间内完成.因此,与可用的处理器数量相比,拆分成更多的部分实际上毫无意义.在我使用的只有两个内核的机器上,降低到 64 的阈值是荒谬的:它只会增加更多的开销.

Think about it like this. The sub-problems are exactly the same size, so they should complete in pretty much the same time. So there's really no point splitting into more pieces than there are processors available. On the machine I'm using, with only two cores, going down to a threshold of 64 is ridiculous: it just adds more overhead.

现在您不想限制某些事情,以便即使有更多可用内核,它也只使用两个内核.或许正确的做法是找出运行时的处理器数量,然后将其拆分为多个部分.

Now you don't want to limit things so that it only uses two cores even when there are more available. Perhaps the right thing to do would be to find out the number of processors at runtime, and split into that many pieces.

无论如何,如果我将阈值更改为 512KB(缓冲区大小的一半),现在它会在 13.3 秒内完成.降低到 128KB 或 64KB 将允许使用更多内核(分别最多 8 个或 16 个),并且不会显着影响运行时间.

In any case, if I change the threshold to 512KB (half the buffer size), it now completes in 13.3 seconds. Going down to 128KB or 64KB would allow more cores to be used (up to 8 or 16 respectively), and doesn't significantly affect the runtime.

因此多线程确实有很大的不同.

So multi-threading does make a big difference.

这是一段相当长的旅程,但我们开始时用了 92.9 秒,现在缩短到 13.3 秒……这是原始代码速度的七倍.这不是通过改进渐近(大哦)时间复杂度,它从一开始就是线性(最优)...这一切都是为了改进常数因子.

It's been quite a long journey, but we started out with something that took 92.9 seconds and we're now down to 13.3 seconds... that's seven times the speed of the original code. And that's not by improving the asymptotic (big-Oh) time complexity, which was linear (optimal) right from the start... this has all been about improving the constant factor.

美好的一天.

我想我接下来应该尝试使用 GPU...

I suppose I should probably try using the GPU next...

我使用以下代码生成了随机数,我运行并重定向到一个文件.显然我不能保证你最终会得到与我完全相同的随机数:)

I generated the random numbers with the following code, which I ran and redirected to a file. Obviously I can't guarantee that you'll end up with exactly the same random numbers that I had :)

public static void genRandoms() {
    Random r = new Random();
    for (int i = 0; i < 100000000; i++)
        System.out.println(r.nextInt(1000000000));
}

推荐答案

我认为还有另一种方法可以做到这一点.

I think there is another way of doing this.

这是经典的多进程编程问题.C 语言中有库 MPI 可以解决此类问题.

This is classical multiple process programming problem. In C language there is library MPI that solves this kinds of problems.

它的想法是将整数列表分成 4 部分,每个部分由不同的过程求和.完成后,流程汇总在一起.

The idea of it is to chunk the list of integers for example in 4 parts and every part is summed by different process. After finishing, processes are summed together.

在 Java 中,这可以通过线程(伪并行)和 Java 并发来完成.

In java this could be done with threads (pseudo parallel) and java concurrency.

例如,4 个不同的线程汇总了列表的 4 个不同部分.最后将它们相加.

E.g 4 different threads summing 4 different parts of the list. At the end they are summed together.

电话公司使用执行这种并行编程技术的网格计算机来汇总他们的交易.

Telephone companies uses Grid Computers that do this kind of parallel programming technic to sum their transactions.

这里唯一的问题(瓶颈)是 IO 操作.读取文件需要很长时间.如果以某种方式您可以让多个线程读取文件的不同部分...这是一种非常复杂的方法,我认为这不会有多大好处,因为磁盘不会因为它被许多线程使用而旋转得更快,但是还有其他技术可以做类似的事情.您可以在此处阅读更多相关信息:通过多个线程访问文件和此处使用多线程读取单个文件:应该加快速度吗?

The only problem here (bottleneck) is the IO operation. Reading the file will take much time. If somehow you can make multiple threads read different parts of the file... This is very complicated approach and I think that this will not do much good because the disk won't spin faster just because it's used by many threads, but there are other technics of doing similar stuff. You can read more about this here: Access File through multiple threads and here Reading a single file with Multiple Thread: should speed up?

这篇关于在文本文件中求和整数的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆