记忆效率问题(Collat​​z冰雹序列) [英] Memoization Efficiency Problems (Collatz Hailstone Sequence)

查看:114
本文介绍了记忆效率问题(Collat​​z冰雹序列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去几天中,我对调查给定数字的Hailstone序列的长度特别感兴趣(从算法而非数学的角度出发)(

在较高的值下,我会遇到内存错误,因此无法检查模式是否继续.

所以我的问题是:为什么对于大的n值,记录算法突然开始比朴素递归算法花费更长的时间?


完全废弃HashMaps并选择简单的数组结构(以及消除检查值是否在数组中的部分开销)会产生所需的效率:

private static final int CACHE_SIZE = 80000000;
private static long[] cache = new long[CACHE_SIZE];

static long seqLen(long n) {
    int count = 0;
    long m = n;

    do {
        if (n % 2 == 0) {
            n /= 2;
        }
        else {
            n = 3*n + 1;
        }
        count++;
    } while (n > m);

    count += cache[(int)n];
    cache[(int)m] = count;
    return count;
}

现在迭代整个高速缓存大小(8000万)仅需要3秒,而使用递归算法则需要93秒. HashMap算法会引发内存错误,因此甚至无法进行比较,但是考虑到它在较低值下的行为,我觉得它无法很好地进行比较.

解决方案

暂时来说,我猜想它花了很多时间重新分配哈希映射.听起来好像您是从一开始就将其清空,然后继续向其中添加内容.这意味着随着大小的增加,将需要分配更大的内存块来存储数据,并重新计算所有元素的哈希,即O(N).尝试将大小预分配到您希望放入的大小.请参见 https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html 进行更多讨论.

I was particularly interested over the last few days (more from an algorithmic than mathematical perspective) in investigating the length of a given number's Hailstone sequence (Collatz conjecture). Implementing a recursive algorithm is probably the simplest way to calculate the length, but seemed to me like an unnecessary waste of calculation time. Many sequences overlap; take for example 3's Hailstone sequence:

3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

This has length 7; more specifically, it takes 7 operations to get to 1. If we then take 6:

6 -> 3 -> ...

We notice immediately that we've already calculated this, so we just add on the sequence length of 3 instead of running through all those numbers again, considerably reducing the number of operations required to calculate the sequence length of each number.

I tried to implement this in Java using a HashMap (seemed appropriate given O(1) probabilistic get/put complexity):

import java.util.HashMap;

/* NOTE: cache.put(1,0); is called in main to act as the
 * 'base case' of sorts. 
 */

private static HashMap<Long, Long> cache = new HashMap<>();

/* Returns length of sequence, pulling prerecorded value from
 * from cache whenever possible, and saving unrecorded values
 * to the cache.
 */
static long seqLen(long n) {
    long count = 0, m = n;
    while (true) {
        if (cache.containsKey(n)) {
            count += cache.get(n);
            cache.put(m, count);
            return count;
        }
        else if (n % 2 == 0) {
            n /= 2;
        }
        else {
            n = 3*n + 1;
        }
        count++;
    }
}

What seqLen will essentially do is start at a given number and work through that number's Hailstone sequence until it comes across a number already in the cache, in which case it will add that on to the current value of count, and then log the value and the associated sequence length in the HashMap as a (key,val) pair.

I also had the following fairly standard recursive algorithm for comparison:

static long recSeqLen(long n) {
    if (n == 1) {
        return 0;
    }
    else if (n % 2 == 0) {
        return 1 + recSeqLen(n / 2);
    }
    else return 1 + recSeqLen(3*n + 1);
}

The logging algorithm should, by all accounts, run quite a bit quicker than the naive recursive method. However in most cases, it doesn't run that much faster at all, and for larger inputs, it actually runs slower. Running the following code yields times that vary considerably as the size of n changes:

long n = ... // However many numbers I want to calculate sequence
             // lengths for.

long st = System.nanoTime();
// Iterative logging algorithm
for (long i = 2; i < n; i++) {
    seqLen(i);
}
long et = System.nanoTime();
System.out.printf("HashMap algorithm: %d ms\n", (et - st) / 1000000);

st = System.nanoTime();
// Using recursion without logging values:
for (long i = 2; i < n; i++) {
    recSeqLen(i);
}
et = System.nanoTime();
System.out.printf("Recusive non-logging algorithm: %d ms\n",
                    (et - st) / 1000000);

  • n = 1,000: ~2ms for both algorithms
  • n = 100,000: ~65ms for Iterative logging, ~75ms for Recursive non-logging
  • n = 1,000,000: ~500ms and ~900ms
  • n = 10,000,000: ~14,000ms and ~10,000ms

At higher values I get memory errors, so I can't check if the pattern continues.

So my question is: why does the logging algorithm suddenly begin to take longer than the naive recursive algorithm for large values of n?


EDIT:

Scrapping HashMaps altogether and opting for a simple array structure (as well as removing part of the overhead of checking whether a value is in the array or not) produces the desired efficiency:

private static final int CACHE_SIZE = 80000000;
private static long[] cache = new long[CACHE_SIZE];

static long seqLen(long n) {
    int count = 0;
    long m = n;

    do {
        if (n % 2 == 0) {
            n /= 2;
        }
        else {
            n = 3*n + 1;
        }
        count++;
    } while (n > m);

    count += cache[(int)n];
    cache[(int)m] = count;
    return count;
}

Iterating over the entire cache size (80 million) now takes a mere 3 seconds, as opposed to 93 seconds using the recursive algorithm. The HashMap algorithm throws memory error, so it can't even be compared, but given it's behaviour at lower values, I have a feeling it wouldn't compare well.

解决方案

Off the cuff, I'd guess it's spending a lot of time reallocating the hash map. Sounds like you're starting it off empty and keep adding stuff to it. That means as it grows in size, it will need to allocate a bigger chunk of memory to store your data, and recompute the hash for all elements, which is O(N). Try pre-allocating the size to what you expect to put in there. See https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html for more discussion.

这篇关于记忆效率问题(Collat​​z冰雹序列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆