令牌解析器 [英] Token Parser

查看:90
本文介绍了令牌解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个函数来解析反向波兰表示法字符串
来自stdin的
并一次返回1个令牌。对于那些不知道的人来说

RPN字符串是这样的:


1 2 + 4 * 3 +


读取每个数字并将其推入堆栈,当遇到运算符

时,堆栈中的每个数字都会弹出并计算

使用该运算符结果被推到了堆栈上。


我遇到的问题是区分例如+

字符和数字43这里。我到目前为止:


#include< stdio.h>


int read_token(void){

int token,next_token;


while((token = getchar())!= EOF){

if(token ==' '''|| token ==''\ n''){

继续;

}

if(token ==' '+''|| token =='' - ''|| token ==''*''|| token ==''/''){

break;

} else {

while((next_token = getchar())!=' ''&& next_token!= EOF){

令牌* = 10;

令牌+ = next_token;

}

休息;

}

}


返回令牌;

}


克服

字符与C中数字之间的模糊性的传统智慧是什么?


谢谢。

I''m trying to write a function to parse a Reverse Polish Notation string
from stdin and return 1 token at a time. For those of you who are unaware
an RPN string looks like this:

1 2 + 4 * 3 +

With each number being read and pushed onto a stack and, when an operator
is encountered, each number on the stack is popped off and calculated
using that operator and the result pushed onto the stack.

The problem I''m having is differentiating between for example, the +
character and the number 43. Here''s what I have so far:

#include <stdio.h>

int read_token(void) {
int token, next_token;

while ((token = getchar()) != EOF) {
if (token == '' '' || token == ''\n'') {
continue;
}
if (token == ''+'' || token == ''-'' || token == ''*'' || token == ''/'') {
break;
} else {
while ((next_token = getchar()) != '' '' && next_token != EOF) {
token *= 10;
token += next_token;
}
break;
}
}

return token;
}

What''s the conventional wisdom on overcoming the ambiguity between
characters and numbers in C?

Thanks.

推荐答案

Simon Morgan写道:
Simon Morgan wrote:
我正在尝试编写一个函数来解析一个从stdin反向波兰表示法字符串
并一次返回1个令牌。对于那些不知道的人来说,RPN字符串是这样的:

1 2 + 4 * 3 +

每个数字都被读取并推送到堆栈,当遇到操作员时,堆栈上的每个数字都会弹出并使用该操作符进行计算,并将结果压入堆栈。

问题我'我正在区分例如+
字符和数字43.这是我到目前为止所拥有的:


提供问题描述的人和问题代码。令人愉快的是一个罕见的

:-)

#include< stdio.h>

int read_token(void){
int token,next_token;

while((token = getchar())!= EOF){
if(token ==''''|| token =='' \ n''){
继续;
}
if(token ==''+''|| token =='' - ''|| token ==''* ''|| token ==''/''){
break;
} else {


你可能想检查一下这里是一个数字而不是

字母或其他一些标点和错误,如果不是。标准

isdigit函数可能有用!

while((next_token = getchar())!=''''&& next_token!= EOF){


再来一次。

令牌* = 10;
令牌+ = next_token;
}
休息;
}

返回令牌;
}


显式正数和负数怎么样?例如

+43 -3 *

克服C中字符和数字之间含糊不清的传统智慧是什么?
I''m trying to write a function to parse a Reverse Polish Notation string
from stdin and return 1 token at a time. For those of you who are unaware
an RPN string looks like this:

1 2 + 4 * 3 +

With each number being read and pushed onto a stack and, when an operator
is encountered, each number on the stack is popped off and calculated
using that operator and the result pushed onto the stack.

The problem I''m having is differentiating between for example, the +
character and the number 43. Here''s what I have so far:
A man who provides a problem description and the problem code. A rarity
that it is pleasant to encounter :-)
#include <stdio.h>

int read_token(void) {
int token, next_token;

while ((token = getchar()) != EOF) {
if (token == '' '' || token == ''\n'') {
continue;
}
if (token == ''+'' || token == ''-'' || token == ''*'' || token == ''/'') {
break;
} else {
You probably want to check that it was a digit here as opposed to a
letter or some other punctuation and error if it was not. The standard
isdigit function could be of use!
while ((next_token = getchar()) != '' '' && next_token != EOF) {
Again here.
token *= 10;
token += next_token;
}
break;
}
}

return token;
}
What about explicit positive and negative numbers? e.g.
+43 -3 *
What''s the conventional wisdom on overcoming the ambiguity between
characters and numbers in C?




我会说返回一个令牌说最好的选择是返回一个

令牌值,表示一个数字(可能是单独的整数令牌,

real etc)并单独返回值。您可以使用两个项目(不是不合理的)返回

结构,或者返回其中一个

将指针传递到应该存储的位置。

-

Flash Gordon

生活在有趣的时代。

虽然我的电子邮件地址说垃圾邮件,但这是真实的,我读了它。



I would say that returning a token say the best option is to return a
token value indicating a number (possibly separate tokens for integer,
real etc) and return the value separately. You could either return a
structure with both items (not unreasonable) or return one of them be
passing a pointer to where it should be stored.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.


2005年7月23日星期六09:55:04 GMT,Simon Morgan

< me@privacy.net>写道:
On Sat, 23 Jul 2005 09:55:04 GMT, Simon Morgan
<me@privacy.net> wrote:
克服C中字符和数字之间的模糊性的传统智慧是什么?
What''s the conventional wisdom on overcoming the ambiguity between
characters and numbers in C?




嗯,显然你没有回复足够的信息。你需要返回令牌类型(运营商或号码)以及

值(运营商类型,号码值)。一般来说,要么分别返回类型和vakue,要么确定某些值是超出

界限。 (例如负数)并指定

范围内的运算符。


在你的情况下,你似乎只处理正整数,所以你可以说

例如:


#define PLUS( - ''+'')

#define MINUS(-''-- '')

#define TIMES( - ''*'')


等然后当你得到一个运算符set token = -token。时。 />

你的代码也不安全。你说的是,你所知道的''已知''代币或空格中的任何东西都不是一个数字,这将给你带来非常好的&b
奇怪的结果与abc def + "甚至是123.567。你真的需要

决定令牌是否:

一个运营商

一个数字

其他东西


如果是其他东西可能会出错。


Chris C



Well, it''s obvious that you aren''t returning enough information. You
need to return the type of the token (operator or number) as well as its
value (type of operator, value of the number). Generally, either return
the type and vakue separately, or decide that certain values are "out of
bounds" (negative numbers, for instance) and assign the operators in
that range.

In your case you seem to handle only positive integers, so you could say
for instance:

#define PLUS (-''+'')
#define MINUS (-''-'')
#define TIMES (-''*'')

etc. then when you get an operator set token = -token.

Your code is also not at all safe. You are saying that anything not one
of your ''known'' tokens or space is a digit, that will give you very
strange results with "abc def +" or even "123.567". You really need to
decide whether the token is:

An operator
A number
Something else

and possibly give an error if it''s "something else".

Chris C


Simon Morgan写道:
Simon Morgan wrote:
我正在尝试编写一个函数来解析来自stdin的反向波兰表示法字符串
并一次返回1个令牌。对于那些不知道的人来说,RPN字符串是这样的:

1 2 + 4 * 3 +

每个数字都被读取并推送到堆栈,当遇到操作员时,堆栈上的每个数字都会弹出并使用该操作符进行计算,并将结果压入堆栈。

问题我'我正在区分例如+
字符和数字43.这就是我到目前为止所拥有的:

#include< stdio.h>

int read_token(void){
int token,next_token;

while((token = getchar())!= EOF){
if(token = =''''|| token ==''\ n''){
继续;
}
if(token ==''+''|| token ==' ' - ''|| token ==''*''|| token ==''/''){
break;
} else {
while((next_token = getchar( ))!=''''&a mp;& next_token!= EOF){
令牌* = 10;
令牌+ = next_token;
}
休息;
}
}

返回令牌;
}
I''m trying to write a function to parse a Reverse Polish Notation string
from stdin and return 1 token at a time. For those of you who are unaware
an RPN string looks like this:

1 2 + 4 * 3 +

With each number being read and pushed onto a stack and, when an operator
is encountered, each number on the stack is popped off and calculated
using that operator and the result pushed onto the stack.

The problem I''m having is differentiating between for example, the +
character and the number 43. Here''s what I have so far:

#include <stdio.h>

int read_token(void) {
int token, next_token;

while ((token = getchar()) != EOF) {
if (token == '' '' || token == ''\n'') {
continue;
}
if (token == ''+'' || token == ''-'' || token == ''*'' || token == ''/'') {
break;
} else {
while ((next_token = getchar()) != '' '' && next_token != EOF) {
token *= 10;
token += next_token;
}
break;
}
}

return token;
}




这里有几个问题。首先,你输入的字符是令人困惑的代币

。虽然令牌可以包含单个

字符,但它实际上是一个更高级别的词法对象,它具有类型和值的
。您的代码尝试在一个整数中对类型和

值进行编码,但由于您的某个令牌类型是

整数,因此您肯定会遇到问题。通常的解决方案是

来分别处理令牌类型及其值。


你也混淆了字符0 - 9 '数字0-9。


你应该看看lex或flex来获得关于

lexing的一些想法。你也应该停止comp.compilers。


这是一个给你这个想法的例子。它不是很强大,但是

应该可以让你开始。


/ * --------------- snip - -------------- * /

#include< stdio.h>

#include< ctype.h>


#define UNKNOWN 1

#define NUMBER 2

#define OP 3


union val {

int num;

char op;

char ch;

};


int token(union val * val)

{

int tok = 0;

int c;


while((c = getchar())!= EOF){

if(isspace(c)){

/ *什么都不做* /

}

否则if(isdigit(c)){

tok = NUM​​BER;

val-> num = c - ''0'';

while(isdigit(c = getchar())){

val-> num * = 10;

val-> num + = c - ''0'';

}

if(!isdigit(c)) {

ungetc(c,stdin);

}

休息;

}

else if(c ==''+''|| c ==''*''|| c =='' - ''|| c ==''/''){

tok = OP;

val-> op = c;

break;

}

else {

tok = UNKNOWN;

val-> ch = c;

休息;

}

}


返回tok;

}


int main(int argc,char * argv [])

{

union val val;

int tok;


while((tok = token(& val))){

switch(tok){

案件编号:

printf(" NUMBER =%d \ n",val.num);

break;

案例OP:

printf(" OP =%c\ n",val.op);

休息;

案例未知:

printf(意外字符:%c \ n,val.ch);

休息;

默认值:

printf(未知令牌类型:%d \ n,tok);

}

}


返回0;

}

/ * --------------- snip ------------ --- * /

-

Thom作为M. Sommers - tm*@nj.net - AB2SB



There are several problems here. First, you are confusing tokens
with input characters. While a token can consist of a single
character, it is really a higher-level lexical object that has
both a type and a value. Your code tries to encode both type and
value in a single integer, but since one of your token types is
integer, you are bound to have problems. The usual solution is
to handle the token type and its value separately.

You are also confusing the characters ''0''-''9'' with the numbers 0-9.

You should take a look at lex or flex to get some ideas on
lexing. You should also stop by comp.compilers.

Here is a sample to give you the idea. It is hardly robust, but
should get you started.

/* --------------- snip --------------- */
#include <stdio.h>
#include <ctype.h>

#define UNKNOWN 1
#define NUMBER 2
#define OP 3

union val {
int num;
char op;
char ch;
};

int token(union val *val)
{
int tok = 0;
int c;

while ( (c = getchar()) != EOF ) {
if ( isspace(c) ) {
/* do nothing */
}
else if ( isdigit(c) ) {
tok = NUMBER;
val->num = c - ''0'';
while ( isdigit(c = getchar()) ) {
val->num *= 10;
val->num += c - ''0'';
}
if ( !isdigit(c) ) {
ungetc(c, stdin);
}
break;
}
else if ( c == ''+'' || c == ''*'' || c == ''-'' || c == ''/'' ) {
tok = OP;
val->op = c;
break;
}
else {
tok = UNKNOWN;
val->ch = c;
break;
}
}

return tok;
}

int main(int argc, char *argv[])
{
union val val;
int tok;

while ( (tok = token(&val)) ) {
switch (tok) {
case NUMBER:
printf("NUMBER = %d\n", val.num);
break;
case OP:
printf("OP = %c\n", val.op);
break;
case UNKNOWN:
printf("unexpected character: %c\n", val.ch);
break;
default:
printf("unknown token type: %d\n", tok);
}
}

return 0;
}
/* --------------- snip --------------- */
--
Thomas M. Sommers -- tm*@nj.net -- AB2SB


这篇关于令牌解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆