关于 Prolog 分词器 [英] About a Prolog tokenizer

查看:70
本文介绍了关于 Prolog 分词器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的一项作业要求我们构建一个 prolog 标记器.现在我写了一个谓词,它可以改变空间并换行.但我不知道如何在主程序中实现它.

替换部分如下所示:

replace(_, _, [], []).替换(O,R,[O|T],[R|T2]):-替换(O,R,T,T2).替换(O, R, [H|T], [H|T2]) :- H \= O, 替换(O, R, T, T2).

Main 部分有一个谓词,叫做 removewhite(list1 list2)

那么我怎样才能让 removewhite 执行替换?

解决方案

您对标记器有点偏离轨道":removewhite/2 不会为您提供任何有用的功能.相反,请考虑 DCG(当然,如果您的 Prolog 提供此功能):

tokenize(String, Tokens) :- 短语(tokenize(Tokens), String).标记化([]) -->[].标记化(令牌)-->跳过空间,标记化(令牌).tokenize([Number|Tokens]) -->数字(数字),标记化(令牌).skip_spaces -->代码类型(白色,[_|_]).数量(N) -->code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.code_types(Type, [C|Cs]) -->[C], {code_type(C,Type)}, !, code_types(Type, Cs).code_types(_, []) -->[].

尽管很简单,但这是一个相当高效的扫描器,易于扩展.在 SWI-Prolog 中,它具有(非 ISO 兼容)扩展以有效处理字符串,可以从顶级调用,例如:

?- tokenize(`123 4 567 `, L).L = [123, 4, 567]

?- atom_codes('123 4 567 ',Cs), tokenize(Cs, L).Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],L = [123, 4, 567]

顺便说一句,在 SWI-Prolog 中,number//1 在 库(dcg/basics).

无论如何,关于你的问题

<块引用>

如何让 removewhite 执行替换?

我觉得你真的在'吠错树':删除一个空格 - 实际上一个分隔符 - 会弄乱你的输入......

One of my assignments ask us to build a prolog tokenizer. Right now I wrote a predicate that can change space and tab it new line. But I don't know how to implement that into the main program.

The replace part looks like this:

replace(_, _, [], []).
replace(O, R, [O|T], [R|T2]):- replace(O, R, T, T2).
replace(O, R, [H|T], [H|T2]) :- H \= O, replace(O, R, T, T2).

And the Main part has a predicate called removewhite(list1 list2)

So how can I let removewhite execute replace?

解决方案

You are a bit 'off trail' toward a tokenizer: removewhite/2 isn't going to buy you any useful functionality. Instead, consider a DCG (of course if your Prolog offers this functionality):

tokenize(String, Tokens) :- phrase(tokenize(Tokens), String).

tokenize([]) --> [].
tokenize(Tokens) --> skip_spaces, tokenize(Tokens).
tokenize([Number|Tokens]) --> number(Number), tokenize(Tokens).

skip_spaces --> code_types(white, [_|_]).
number(N) --> code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.

code_types(Type, [C|Cs]) --> [C], {code_type(C,Type)}, !, code_types(Type, Cs).
code_types(_, []) --> [].

despite the simplicity, this is a fairly efficient scanner, easily extensible. In SWI-Prolog, that has (non ISO compliant) extensions for efficient handling of strings, this can be called from top level like:

?- tokenize(`123  4 567  `, L).
L = [123, 4, 567]

or

?- atom_codes('123  4 567  ',Cs), tokenize(Cs, L).
Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],
L = [123, 4, 567] 

Btw, in SWI-Prolog, number//1 is predefined (with much more functionality, of course) in library(dcg/basics).

Anyway, about your question

how can I let removewhite execute replace?

I feel you're really 'barking the wrong tree': removing a space - that actually is a separator - will mess up your input...

这篇关于关于 Prolog 分词器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆