Mathematica“链表"和性能 [英] Mathematica "linked lists" and performance

查看:27
本文介绍了Mathematica“链表"和性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Mathematica 中,我像这样创建单链表:

In Mathematica, I create singly linked lists like so:

toLinkedList[x_List] := Fold[pair[#2, #1] &, pair[], Reverse[x]];

fromLinkedList[ll_pair] := List @@ Flatten[ll];

emptyQ[pair[]] := True;
emptyQ[_pair] := False;    

对 cons 单元使用符号 pair 具有 Flatten 安全工作的优点,即使列表包含 Mathematica 样式的 Lists,并允许您使用 MakeExpression/MakeBoxes 定义自定义符号,这让一切变得更加愉快.为了避免使用 $IterationLimit,我编写了使用 While 循环或 NestWhile 而不是使用递归来处理这些列表的函数.当然,我想看看哪种方法会更快,所以我写了两个候选人,这样我就可以看他们打架了:

Using the symbol pair for the cons cells has the advantage of Flatten working safely even if the lists contain Mathematica-style Lists, and allows you to define custom notation using MakeExpression/MakeBoxes, which makes everything much more pleasant. In order to avoid having to muck around with $IterationLimit, I wrote functions to work with these lists using either While loops or NestWhile instead of using recursion. Naturally, I wanted to see which approach would be faster, so I wrote two candidates so I could watch 'em fight:

nestLength[ll_pair] := 
 With[{step = {#[[1, -1]], #[[-1]] + 1} &},
  Last@NestWhile[step, {ll, 0}, ! emptyQ@First@# &]];

whileLength[ll_pair] := 
 Module[{result = 0, current = ll},
  While[! emptyQ@current,
   current = current[[2]];
   ++result];
  result];

结果很奇怪.我在长度为 10000 的链表上测试了这些函数,whileLength 通常快了大约 50%,大约 0.035 秒到 nestLength 的 0.055 秒.然而,偶尔 whileLength 需要大约 4 秒.我认为可能存在一些缓存行为,因此我开始生成新的随机列表进行检查,并且 whileLength 在第一次运行新列表时不一定会很慢;可能需要几十次才能看到减速,但它不会再次发生(至少对于我尝试对每个列表进行的 200 次运行不会再次发生).

The results were very strange. I tested the functions on linked lists of length 10000, and whileLength was usually about 50% faster, at about 0.035 seconds to nestLength's 0.055 seconds. However, occasionally whileLength would take about ~4 seconds. I thought there might be some caching behavior, so I started generating fresh, random lists to check, and whileLength wouldn't necessarily be slow on the first run with a new list; it might take dozens of times to see the slowdown, but then it wouldn't recur (at least not for the 200 runs I was trying with each list).

可能会发生什么?

作为参考,我用来测试的函数是这样的:

For reference, the function I used for testing is this:

getTimes[f_, n_] :=
 With[{ll = toLinkedList@RandomInteger[100, 10000]},
  Table[Timing[f@ll], {n}][[All, 1]]]

我之前没有提到这个版本;我用 Mathematica 8 得到了这些结果.

I neglected to mention the version earlier; I got these results with Mathematica 8.

编辑第二个:当我阅读 DanielLichtblau 的回答,我意识到我的典型"运行时间省略了前导 0.它已被修复.

EDIT the second: When I read Daniel Lichtblau's answer, I realized that my times for "typical" runs omitted a leading 0. It's been fixed.

编辑第三个:我认为 Leonid Shifrin 将问题与 Module 相关联是正确的;通过将 With 替换为 Module,我可以从基于 NestWhile 的版本中获得相同类型的行为:

EDIT the third: I think Leonid Shifrin is correct to associate the issue with Module; I can get the same sort of behavior from the NestWhile-based version by replacing the With with a Module:

nestModuleLength[ll_pair] := 
  Module[{step = {#[[1, -1]], #[[-1]] + 1} &}, 
   Last@NestWhile[step, {ll, 0}, ! emptyQ@First@# &]];

In[15]:= Select[getTimes[nestModuleLength, 100], # > 3 &]
Out[15]= {3.797}

推荐答案

下面的例子给出了典型的结果.

The examples below give typical results.

长度为 20 次的慢速示例.

One slow example in a length 20 run.

In[18]:= getTimes[whileLength, 20]

Out[18]= {0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, \
0.031, 0.047, 0.032, 0.031, 0.031, 3.547, 0.047, 0.031, 0.031, 0.032, \
0.031, 0.031}

我顺便注意到,除了可比较的慢速情况外,时间比原始帖子快约 10 倍.不确定是什么导致了这种比率差异.

I note in passing that the timings are ~10x faster than in the original post, except for the slow cases which are comparable. Not sure what accounts for that difference in ratios.

没有慢的例子.

In[17]:= getTimes[nestLength, 20]

Out[17]= {0.047, 0.047, 0.062, 0.047, 0.047, 0.062, 0.047, 0.047, \
0.047, 0.063, 0.046, 0.047, 0.047, 0.063, 0.047, 0.046, 0.047, 0.063, \
0.047, 0.047}

长度为 100 次的慢速示例.

One slow example in a length 100 run.

In[19]:= getTimes[whileLength, 100]

Out[19]= {0.031, 0.031, 0.031, 0.032, 0.031, 3.594, 0.047, 0.031, \
0.031, 0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.047, 0.031, \
0.031, 0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.047, 0.031, 0.031, \
0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, \
0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, \
0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, 0.031, \
0.032, 0.031, 0.031, 0.031, 0.032, 0.031, 0.047, 0.031, 0.031, 0.032, \
0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.031, 0.032, 0.046, 0.032, \
0.031, 0.031, 0.031, 0.032, 0.031, 0.031, 0.047, 0.031, 0.032, 0.031, \
0.031, 0.031, 0.032, 0.031, 0.047, 0.031, 0.031, 0.031, 0.032, 0.031, \
0.031, 0.031}

Mathematica 不完美地实现了所谓的无限评估".也就是说,一个表达式会重新计算,直到它停止改变.为了使这个过程相当快,有各种优化尝试尽可能缩短流程.

Mathematica implements, imperfectly, what is called "infinite evaluation". That is to say, an expression reevaluates until it stops changing. To make this reasonably fast there are various optimizations that attempt to short circuit the process whenever possible.

在某些情况下,这可能很难辨别(由于类似于哈希冲突的效果),并且可能会不必要地重新评估表达式.深度嵌套的表达式往往是最坏的情况.我们还有更多的代码可以经常解决这些问题,即使发生冲突也是如此.

In some cases this can be tricky to discern (due to an effect similar to hash collisions), and expressions might be needlessly reevaluated. Deeply nested expressions tend to be a worst case for this. We have further code that will often address these even in cases of collisions.

这个例子中的罪魁祸首正是这段试图快速确定表达式是否需要重新计算的代码.这是奇怪的,但可能是一个线索(对某人来说),在该 While 循环内的运行中最多发生一次.因此,在糟糕的情况下会发生一些事情,以防止在同一 While 内再次发生.

The culprit in this instance is exactly this code that attempts to determine quickly whether an expression requires reevaluation. It is peculiar but possibly a clue (to someone) that this happens at most once in a run inside that While loop. So something happens in the bad cases that prevents recurrence whilst inside the same While.

有一次我很熟悉重新评估检测代码,写了一大块.但是它在版本 8 中被重写了.因此,即使在调试器中看到这种次优行为后,对我来说仍然是个谜.我现在只能说我提交了错误报告.

At one time I was familiar with the reevaluation detection code, having written a chunk of it. But it was rewritten for version 8. So even after seeing this suboptimal behavior in a debugger, it is is something of a mystery to me. About all I can say right now is that I filed a bug report.

正如 Leonid Shifrin 所观察到的,具有 HoldAllComplete 属性的符号不受此问题的影响.因此,使用该属性可能对此类代码有益.

As Leonid Shifrin observed, symbols with Attribute of HoldAllComplete are immune to this problem. So using that attribute can be beneficial for this type of code.

丹尼尔·利希布劳Wolfram 研究

Daniel Lichtblau Wolfram Research

这篇关于Mathematica“链表"和性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆