优化 Prolog 程序(删除重复项,重复重新计算) [英] Optimising a Prolog Program (remove duplicates, repeat recalculation)

查看:23
本文介绍了优化 Prolog 程序(删除重复项,重复重新计算)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 Prolog 非常缺乏经验.我有一个数据集,其中包含图形中具有循环性(相当多)的元素和关系.有规则可以计算路径的汇总关系.其中之一是:一个走路径,然后走最弱的关系,这就是两端之间的关系.

与元素 E1E2E3
关系R1/R1cR2R3(强度从低到高)和
结构 E1-R3-E2E1-R1-E2E2-R2-E3E3-R1-E2代码>

我可以做以下最小的例子:

% 弱表isWeaker(r1, r2).isWeaker(r2, r3).较弱(X,Y):- isWeaker(X,Y).较弱(X,Y):-isWeaker( X, Z),较弱(Z,Y).% '最弱' 是 <= not <最弱(X,X,Y):- =(X,Y).最弱(X,X,Y):-较弱(X,Y).% 所有直接关系isADirectRelation(e1, r1, e2).isADirectRelation(e1, r3, e2).isADirectRelation(e2, r2, e3).isADirectRelation(e3, r1, e2).isADirectRelation(e1, r3, e4).isADirectRelation(e4, r2, e3).isADirectRelation(e1, r1, e4).isADirectRelation(e3, r1, e4).% 派生关系计算% 结构链isADerivedRelation(源、关系、目标、访问):-\+ 成员([来源,关系,目标],已访问),最弱(关系,关系一,关系二),isARelation( Source, RelationOne, Intermediate, [[Source,Relation,Target]|Visited]),isARelation(中级,RelationTwo,目标,[[Source,Relation,Target]|Visited]).% 具有反圆度的主要主力isARelation(来源、关系、目标、访问):-(isADirectRelation(源,关系,目标);isADerivedRelation(源、关系、目标、访问)).

isARelation(Source, Relation, Target, []). 的结果是

e1,r1,e2e1,r3,e2e2,r2,e3e3,r1,e2e1,r3,e4e4,r2,e3e1,r1,e4e3,r1,e4e1,r1,e3e3,r1,e3e1,r1,e3 重复e3,r1,e3 重复

缺少

e4,r1,e4e2,r2,e2

是否有可能在 Prolog 中解决这个问题?形式上,是的,当然,但也有不错的表现?

解决方案

关于这个问题有很多话要说,所以这将是一个冗长而漫无边际的答案,最终不会令人满意.所以我不妨先从一个小问题开始:请不要使用 camelCaseIdentifiers 作为谓词,我们通常使用 underscore_separated_words 代替.我不知道为什么这会在 Prolog 中特别困扰我,但我怀疑部分是因为大写字母在语法上很重要.

继续,你的 weakest/3 谓词被破坏了:

?- 最弱(最弱,r2,r1).错误的.

我认为您在问题的第一个版本中拥有此权利,但随后您删除了 weakest/3 的第三个子句,因为您认为它会导致多余的答案.不用说,效率"没有正确性就毫无用处.(此外,我们通常将输出"参数放在最后,而不是放在第一位.)

您得到冗余答案的部分原因是您在 isADerivedRelation/4 中使用了对 isARelation/4 的两次(间接)递归调用.您正在计算的内容类似于直接"关系联合的传递闭包.Prolog中传递闭包的常用表达方式是这样的:

transitive_closure(A, B) :-基础关系(A,B).传递闭包(A,C):-基础关系(A,B),传递闭包(B,C).

也就是说,我们首先采取基本步骤",然后递归.如果我们的基本关系有对 abbccd,那么这将恰好找到一次解决方案 ad,作为基本对ab 和派生的传递对bd 的组合.相比之下,如果我们像你一样构造第二个子句,对 transitive_closure/2 进行两次递归调用,我们将得到两次解决方案 ad: 一次如上,但也有一次,因为我们将导出传递对 ac 并将其与 cd 组合以给出 ad.

您可以通过将 isADerivedRelation/4 中的第一个 isARelation/4 调用更改为对 isADirectRelation/3 的调用来解决此问题.

另一个问题是您错误地使用了Visited:在您证明这样的解决方案甚至存在之前,您将Source-Target 对标记为已访问!您可能应该将 Source-Intermediate 标记为已访问.

即便如此,如果图中这些元素之间存在多条不同的路径,您仍然会得到一对元素的冗余解.这就是 Prolog 逻辑的工作原理:Prolog 为您的查询找到单独的答案,但不允许您直接谈论这些答案之间的关系.如果我们想强制它把所有东西都枚举一次,就必须离开纯逻辑.

一些 Prolog 系统提供了一个称为表格"的功能,它基本上缓存了表格"谓词的所有解决方案并避免重新计算.这应该避免多余的答案,甚至简化你的定义:如果你的闭包关系被列出,你不再需要跟踪 Visited 列表,因为列表将避免循环重新计算.我不能给你经过测试的代码,因为我没有带桌子的 Prolog.即使您的系统没有提供表格,理论上也有可能使用 Prolog 的不纯数据库自己记忆"解决方案.如果没有任何多余的解决方案,很难做到完全正确.

作为不纯 Prolog 的替代方案,您的问题似乎更适合 .这些是使用类 Prolog 语法的编程模型,但具有似乎正是您想要的集合语义:命题要么是解决方案,要么不是解决方案,没有冗余解决方案的概念.整个解决方案集是一次性计算的.循环消除也是自动的,因此您不需要(实际上,由于输入语言的限制,无法使用)一个Visited 列表.如果我是你,我会尝试在 Datalog 中做到这一点.

作为进一步的 Prolog 扩展,可能有一个基于约束处理规则 (CHR) 的漂亮解决方案.但真的,请尝试使用 Datalog.

最后,我不明白你为什么认为 e2,r2,e2 是一个缺失的解决方案.我看到的从 e2e2 的唯一路径通过 e3 并通过 返回到 e2r1 关系,是最弱的,所以解应该是e2,r1,e2.

I am very inexperienced with Prolog. I have a data set that contains elements and relations in graph that has circularity (quite a lot). There are rules to calculate the summary relation of a path. One of these is: one takes the path, then takes the weakest relation, and that is the one that holds between both ends.

With Elements E1, E2, E3 and
Relations R1/R1c, R2, R3 (strength low to high) and
structure E1-R3-E2, E1-R1-E2, E2-R2-E3, E3-R1-E2

I can make the following minimal example:

% weaker table
isWeaker( r1, r2).
isWeaker( r2, r3).
weaker( X, Y) :- isWeaker( X, Y).
weaker( X, Y) :-
    isWeaker( X, Z),
    weaker( Z, Y).
% 'weakest' is <= not <
weakest( X, X, Y) :- =(X,Y).
weakest( X, X, Y) :- weaker( X, Y).

% All direct relations
isADirectRelation( e1, r1, e2).
isADirectRelation( e1, r3, e2).
isADirectRelation( e2, r2, e3).
isADirectRelation( e3, r1, e2).
isADirectRelation( e1, r3, e4).
isADirectRelation( e4, r2, e3).
isADirectRelation( e1, r1, e4).
isADirectRelation( e3, r1, e4).

% derived relations calculations

% Structural Chains
isADerivedRelation( Source, Relation, Target, Visited) :-
    \+ member( [Source,Relation,Target], Visited),
    weakest( Relation, RelationOne, RelationTwo),
    isARelation( Source, RelationOne, Intermediate, [[Source,Relation,Target]|Visited]),
    isARelation( Intermediate, RelationTwo, Target, [[Source,Relation,Target]|Visited]).

% major workhorse with anti-circularity
isARelation( Source, Relation, Target, Visited) :-
    (isADirectRelation( Source, Relation, Target);
     isADerivedRelation( Source, Relation, Target, Visited)).

The result of isARelation( Source, Relation, Target, []). is

e1,r1,e2
e1,r3,e2
e2,r2,e3
e3,r1,e2
e1,r3,e4
e4,r2,e3
e1,r1,e4
e3,r1,e4

e1,r1,e3
e3,r1,e3
e1,r1,e3 duplicate
e3,r1,e3 duplicate

Missing are

e4,r1,e4
e2,r2,e2

Is it at all possible to solve this in Prolog? Formally, yes, of course, but also with a decent performance?

解决方案

There are many things to be said about this question, so this will be a long and rambling and ultimately unsatisfactory answer. So I might as well start with a pet peeve: Please don't use camelCaseIdentifiers for predicates, we usually use underscore_separated_words instead. I'm not sure why this bugs me in Prolog in particular, but I suspect partly because uppercase letters are syntactically significant.

Moving on, your weakest/3 predicate is broken:

?- weakest(Weakest, r2, r1).
false.

I think you had this right in the first version of your question, but then you removed the third clause of weakest/3 because you thought it caused redundant answers. Needless to say, "efficiency" is useless without correctness. (Also, we usually put the "output" arguments last, not first.)

Part of the reason you get redundant answers is your use of two (indirectly) recursive calls to isARelation/4 in isADerivedRelation/4. What you are computing is something like the transitive closure of the union of "direct" relations. The usual way to express the transitive closure in Prolog is like this:

transitive_closure(A, B) :-
    base_relation(A, B).
transitive_closure(A, C) :-
    base_relation(A, B),
    transitive_closure(B, C).

That is, we first "take a base step", then recurse. If our base relation has pairs a-b, b-c, c-d, then this will find the solution a-d exactly once, as the composition of the base pair a-b and the derived transitive pair b-d. In contrast, if we were to structure the second clause as you did, with two recursive calls to transitive_closure/2, we would get the solution a-d twice: Once as above, but also once because we would derive the transitive pair a-c and compose it with c-d to give a-d.

You can fix this by changing your first isARelation/4 call in isADerivedRelation/4 into a call to isADirectRelation/3.

Another problem is that you are using Visited wrong: You are marking the pair Source-Target as visited before you have proved that such a solution even exists! You should probably mark Source-Intermediate as visited instead.

Even so, you will still get redundant solutions for a pair of elements if there are several different paths between those elements in the graph. This is just how Prolog's logic works: Prolog finds individual answers to your query but does not allow you to talk directly about relationships between those answers. If we want to force it to enumerate everything exactly once, we must leave pure logic.

Some Prolog systems offer a feature called "tabling" which essentially caches all the solutions for a "tabled" predicate and avoids re-computations. This should avoid redundant answers and even simplify your definition: If your closure relation is tabled, you no longer need to track a Visited list because cyclic recomputations will be avoided by the tabling. I can't give you tested code because I have no Prolog with tabling lying around. Even without tabling offered by your system, there is the theoretical possibility of "memoizing" solutions yourself, using Prolog's impure database. It's difficult to get it exactly right with no redundant solutions whatsoever.

As an alternative to impure Prolog, your problem seems a better fit to or . These are programming models that use Prolog-like syntax but with set semantics that seems to be exactly what you want: A proposition is either a solution or not, there is no concept of redundant solutions. The entire set of solutions is computed in one go. Cycle elimination is also automatic, so you don't need (in fact, because of the restricted input language, cannot use) a Visited list. If I were you, I would try to do this in Datalog.

As a further Prolog extension, there might be a spiffy solution based on Constraint Handling Rules (CHR). But really, do try Datalog.

Finally, I don't see why you think that e2,r2,e2 is a missing solution. The only path from e2 to e2 that I see goes through e3 and back to e2 via an r1 relation, which is the weakest one, so the solution should be e2,r1,e2.

这篇关于优化 Prolog 程序(删除重复项,重复重新计算)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆