为什么已经弹出的范围会影响后续范围中的 check-sat 时间? [英] Why do already popped scopes affect the check-sat time in subsequent scopes?

查看:54
本文介绍了为什么已经弹出的范围会影响后续范围中的 check-sat 时间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我多次注意到已经弹出的 push-pop 作用域似乎会影响后续作用域中 check-sat 所需的时间.

I've noticed several times that push-pop scopes that have already been popped appear to affect the time that a check-sat in a subsequent scope needs.

也就是说,假设一个程序有多个(可能是任意嵌套的)push-pop 作用域,每个作用域都包含一个 check-sat 命令.此外,假设第二次 check-sat 需要 10s,而第一个只需要 0.1s.

That is, assume a program with multiple (potentially arbitrarily nested) push-pop scopes, each of which contain a check-sat command. Furthermore, assume that the second check-sat takes 10s, whereas the first one takes only 0.1s.

...
(push)
  (assert (not P))
  (check-sat) ; Could be sat, unsat or unknown
(pop)
...
(push)
  (assert (not Q))
  (check-sat) ; Could be sat, unsat or unknown
(pop)

评论第一个push-pop作用域后,第二个check-sat只需要1s.为什么会这样?

  • AFAIK,如果使用 push-pop 范围,Z3 会切换到增量求解器.他们可能会这样做是否有(概念上的)原因?

  • AFAIK, Z3 switches to incremental solvers if push-pop scopes are used. Is there a (conceptual) reason for why they might behave this way?

我曾被告知 Z3 按重要性对符号进行属性,这会影响证明搜索(并且符号的重要性也可能在证明搜索过程中发生变化).这可能是原因吗?是否可以重置重要性(在范围之间)?

I was told once that Z3 attributes symbols by importance, which affects proof search (and the importance of a symbol can also change during proof search). Could that be the reason? Is it possible to reset importance (in between scopes)?

会不会是bug?我找到了这篇文章 Leonardo 提到了一个似乎相关的错误(不过他的回答是 2012 年的).

Could it be a bug? I found this post where Leonardo mentions a bug that seems related (his answer is from 2012, though).

不幸的是,我只有相当长的(自动生成的)SMTLib 文件来说明行为,您可以在 这个要点.它使用量词和未解释的函数,但既不使用 mbqi,也不使用数组或位向量.该示例由 148 个嵌套的 push-pop 作用域和 89 个 check-sats 组成,Z3 4.3.2 大约需要 8 秒来处理它.最后一次 check-sat(以 echo 为前缀)花费的时间最长.

I unfortunately only have fairly long (automatically generated) SMTLib files that illustrate the behaviour, one of which you can find as this gist. It uses quantifiers and uninterpreted functions, but neither mbqi nor arrays or bit vectors. The example consists of 148 nested push-pop scopes and 89 check-sats, and it takes Z3 4.3.2 about 8s to process it. The last check-sat (prefixed by an echo) takes by far the longest.

我随机评论了几个 push-pop 作用域(一次一个,永远不会是最后一个,确保你不要评论符号声明),并且在大多数情况下,整体运行时间下降到不到 1 秒.也就是说,最后一次 check-sat 完成得明显更快.

I randomly commented several push-pop scopes (one at a time, never the last one, make sure you don't comment symbol declarations), and in most cases, the overall run time went down to less than 1s. That is, the last check-sat was done significantly faster.

为了提供更多细节,我将运行所有范围(慢,8 秒)的 Z3 与运行 Z3 进行了比较,其中以 [XXX] 标记的范围已被注释(快,0.3 秒)).结果可以在 this diff 中看到(左为慢,右为快).

To provide further details, I compared a run of Z3 with all scopes (slow, 8s) with a run of Z3 where the scope marked by [XXX] had been commented (fast, 0.3s). The results can be seen in this diff (left is slow, right is fast).

差异表明所有 check-sat 的行为都相同(我通过 echo'ing "unsat" 伪造了评论的一个),从中我得出结论,评论的范围会影响证明搜索,但不会影响其最终结果.

The diff shows that all check-sats behave the same (I faked the commented one by echo'ing "unsat"), from which I conclude that the commented scope affects the proof search, but not its final outcome.

我也试图从获得的统计数据的差异中找出一些意义,但我对如何正确解释统计数据知之甚少.以下是我能理解的一些统计数据:

I also tried to make some sense out of differences in the obtained statistics, but I know very little about how to interpret the statistics correctly. Here are the few statistics I could make sense of:

  • grobner(383 对 36)和 nonlinear-horner(342 对 25),所以看起来较慢的运行执行更多与算术相关的操作.注释范围确实是非线性算术(还有很多其他的),但是注释范围中的特定证明应该是微不足道的",它本质上表明 x != 0对于 x 关于 0 已被明确假定.

  • grobner (383 vs 36) and nonlinear-horner (342 vs 25), so it seems that the slower run performs more arithmetic-related operations. The commented scope is indeed about to non-linear arithmetic (and so are lots of others), but the particular proof in the commented scope should be "trivial", it is essentially showing that x != 0 for an x about which 0 < x has been assumed explicitly.

memory(40 vs 7),我认为这表明 Z3 在程序的慢速版本中探索了更大的搜索空间

memory (40 vs 7), which I interpret as indicating that Z3 explores a larger search space in the slow version of the program

quant-instantiations(43k 对 51k),这让我感到惊讶,因为明显更快的运行仍然触发了更多的量词实例化.

quant-instantiations (43k vs 51k), which surprised me because the significantly faster run nevertheless triggered notably more quantifier instantiations.

推荐答案

我不确定这是观察还是问题?是的,Z3 对于不同的输入会有不同的表现,push/pop 并不是无辜的",也就是说,它们会对性能产生重大影响.如果它们可以完全删除,这是最明显的,因为这允许 Z3 切换到不支持增量(但通常更快)的完全不同的子求解器.例如,对于没有作用域的纯粹的 bit-blasted 公式,Z3 将使用一个快速、新的 SAT 求解器,但如果需要 push/pop,它会回退到一个更老、更慢的 SAT 求解器(这两个求解器的实现完全是不相交).

I'm not exactly sure whether this is an observation or a question? Yes, Z3 will behave differently for different inputs and push/pop are not "innocent", i.e., they will have a major impact on the performance. This is most obvious if they can be removed completely, because that allows Z3 to switch to completely different sub-solvers that don't support incrementality (but are often faster). For example, for purely bit-blasted formulas without scoping, Z3 will use a fast, new SAT solver, but if push/pop are required, it falls back to a much older and slower SAT solver (the implementations of those two solvers are completely disjoint).

此外,删除一些其他范围之间的范围也可能产生巨大影响,因为它允许 Z3 保留更多中间引理以及启发式状态.如果出于某种原因希望每个查询都表现得好像没有其他查询一样,那么最好简单地生成独立的查询并在每个查询上从头开始 Z3.

Further, removing some of the scopes between others may have a huge impact too, because it allows Z3 to keep more intermediate lemmas, as well as heuristic states. If it is for some reason desired that each query behaves as if no others were there, then it's perhaps best to simply generate independent queries and to start Z3 from scratch on each of them.

有关所提到的具体问题的更多信息:

More information about the specific issues mentioned:

启发式状态"是指 Z3 使用的各种启发式数据,有大量不同的启发式在起作用,而不仅仅是像符号排序这样的特定一种.在查询之间保留此信息是否好"完全取决于您的问题 - 启发式方法适用于某些问题,但不是所有问题,启发式方法的性质也是如此.不过,增量的整个概念都建立在此基础上:如果启发式方法无济于事,那么我们最好运行独立查询.但是,在某些应用程序中,重置 Z3 有时 比不重置或独立查询要好,例如,当您有大量的小查询时.

"Heuristic states" means all kinds of heuristic data Z3 uses, there is a whole plethora of different heuristics at work, not just one particular one like the symbol ordering. Whether it is "good" to keep this information between queries depends entirely on your problems - the heuristics work on some problems, but not on all, as is the nature of heuristics. The whole concept of incrementality is built upon this though: if the heuristics wouldn't help, then we'd be better off running independent queries. However, in some applications, resetting Z3 sometimes is better than no resets or independent queries, for instance, when you have a huge number of tiny queries.

切换到不同求解器的概念性原因:第一个求解器不支持您需要的功能.参见combined_solver.cpp,函数check_sat.如果未使用求解器 1(例如,如果提供了假设或启用了增量模式),则将使用求解器 2.

Conceptual reason for switching to a different solver: the first one doesn't support the features you need. See combined_solver.cpp, function check_sat. If solver1 is not used (e.g., if assumptions are provided or if incremental mode is enabled) then solver2 will be used.

combined_solver.solver2_timeout 将放置一个超时solver2.当solver2 超时时会发生什么由选项combined_solver.solver2_unknown 设置.所以,是的,您可以在solver2 之后运行solver1,但也允许solver1 失败,即返回unknown.查看代码,如果使用它,它很可能是不合理的(例如,忽略假设).

combined_solver.solver2_timeout will put a timeout solver2. What happens when solver2 times out is set by the option combined_solver.solver2_unknown. So, yes, you can run solver1 after solver2, but solver1 is also allowed to fail, i.e., return unknown. Looking at the code, it may well be unsound (e.g., ignoring assumptions) if that's used.

回复:提到的错误报告:这是一个健全性错误,而不是性能错误;一个求解器说 SAT,另一个说 UNSAT.

Re: the bug report that was mentioned: That was a soundness bug, not a performance bug; one of the solvers said SAT and the other said UNSAT.

这篇关于为什么已经弹出的范围会影响后续范围中的 check-sat 时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆