关于 Prolog 分词器 [英] About a Prolog tokenizer
问题描述
我的一项作业要求我们构建一个 prolog 标记器.现在我写了一个谓词,它可以改变空间并换行.但我不知道如何在主程序中实现它.
替换部分如下所示:
replace(_, _, [], []).替换(O,R,[O|T],[R|T2]):-替换(O,R,T,T2).替换(O, R, [H|T], [H|T2]) :- H \= O, 替换(O, R, T, T2).
Main
部分有一个谓词,叫做 removewhite(list1 list2)
那么我怎样才能让 removewhite
执行替换?
您对标记器有点偏离轨道":removewhite/2 不会为您提供任何有用的功能.相反,请考虑 DCG(当然,如果您的 Prolog 提供此功能):
tokenize(String, Tokens) :- 短语(tokenize(Tokens), String).标记化([]) -->[].标记化(令牌)-->跳过空间,标记化(令牌).tokenize([Number|Tokens]) -->数字(数字),标记化(令牌).skip_spaces -->代码类型(白色,[_|_]).数量(N) -->code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.code_types(Type, [C|Cs]) -->[C], {code_type(C,Type)}, !, code_types(Type, Cs).code_types(_, []) -->[].
尽管很简单,但这是一个相当高效的扫描器,易于扩展.在 SWI-Prolog 中,它具有(非 ISO 兼容)扩展以有效处理字符串,可以从顶级调用,例如:
?- tokenize(`123 4 567 `, L).L = [123, 4, 567]
或
?- atom_codes('123 4 567 ',Cs), tokenize(Cs, L).Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],L = [123, 4, 567]
顺便说一句,在 SWI-Prolog 中,number//1 在 库(dcg/basics).
无论如何,关于你的问题
<块引用>如何让 removewhite 执行替换?
我觉得你真的在'吠错树':删除一个空格 - 实际上是一个分隔符 - 会弄乱你的输入......>
One of my assignments ask us to build a prolog tokenizer. Right now I wrote a predicate that can change space and tab it new line. But I don't know how to implement that into the main program.
The replace part looks like this:
replace(_, _, [], []).
replace(O, R, [O|T], [R|T2]):- replace(O, R, T, T2).
replace(O, R, [H|T], [H|T2]) :- H \= O, replace(O, R, T, T2).
And the Main
part has a predicate called removewhite(list1 list2)
So how can I let removewhite
execute replace?
You are a bit 'off trail' toward a tokenizer: removewhite/2 isn't going to buy you any useful functionality. Instead, consider a DCG (of course if your Prolog offers this functionality):
tokenize(String, Tokens) :- phrase(tokenize(Tokens), String).
tokenize([]) --> [].
tokenize(Tokens) --> skip_spaces, tokenize(Tokens).
tokenize([Number|Tokens]) --> number(Number), tokenize(Tokens).
skip_spaces --> code_types(white, [_|_]).
number(N) --> code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.
code_types(Type, [C|Cs]) --> [C], {code_type(C,Type)}, !, code_types(Type, Cs).
code_types(_, []) --> [].
despite the simplicity, this is a fairly efficient scanner, easily extensible. In SWI-Prolog, that has (non ISO compliant) extensions for efficient handling of strings, this can be called from top level like:
?- tokenize(`123 4 567 `, L).
L = [123, 4, 567]
or
?- atom_codes('123 4 567 ',Cs), tokenize(Cs, L).
Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],
L = [123, 4, 567]
Btw, in SWI-Prolog, number//1 is predefined (with much more functionality, of course) in library(dcg/basics).
Anyway, about your question
how can I let removewhite execute replace?
I feel you're really 'barking the wrong tree': removing a space - that actually is a separator - will mess up your input...
这篇关于关于 Prolog 分词器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!