高尔夫代码:数学表达评估器(尊重PEMDAS) [英] Code Golf: Mathematical expression evaluator (that respects PEMDAS)

查看:129
本文介绍了高尔夫代码:数学表达评估器(尊重PEMDAS)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要求您编写一个使用PEMDAS(运算顺序:括号,乘幂,乘法,除法,加法,减法)的数学表达式评估器,而无需使用正则表达式,一个类似"Eval()"的预先存在的函数,解析库等.

I challenge you to write a mathematical expression evaluator that respects PEMDAS (order of operations: parentheses, exponentiation, multiplication, division, addition, subtraction) without using regular expressions, a pre-existing "Eval()"-like function, a parsing library, etc.

我看到了一个关于SO的预先存在的评估者挑战(此处),但是特别需要从左到右的评估.

I saw one pre-existing evaluator challenge on SO (here), but that one specifically required left-to-right evaluation.

样本输入和输出:

"-1^(-3*4/-6)" -> "1"

"-2^(2^(4-1))" -> "256"

"2*6/4^2*4/3" -> "1"

我用C#编写了一个评估器,但是想看看它与他们所选择的语言中的更聪明的程序员相比有多么糟糕.

I wrote an evaluator in C#, but would like to see how badly it compares to those of smarter programmers in their languages of choice.

代码高尔夫:评估数学表达式

说明:

  1. 让我们将此函数作为一个接受字符串参数并返回字符串结果的函数.

  1. Let's make this a function that accepts a string argument and returns a string result.

至于为什么没有正则表达式,那是为了公平起见.我认为最紧凑的正则表达式"应该面临另外一个挑战.

As for why no regexes, well, that's to level the playing field. I think there ought to be a separate challenge for "the most compact regex".

使用StrToFloat()是可以接受的.我所说的解析库"是指排除通用语法解析器之类的东西,同时也可以使游戏环境平整.

Using StrToFloat() is acceptable. By "parsing library" I meant to exclude such things as general-purpose grammar parsers, also to level the playing-field.

支持浮动.

支持paretheses,求幂和四个算术运算符.

Support paretheses, exponentiation, and the four arithmetic operators.

赋予乘法和除法优先级.

Give multiplication and division equal precedence.

加法和减法的优先级相等.

Give addition and subtraction equal precedence.

为简单起见,您可以假设所有输入的格式都正确.

For simplicity, you may assume all inputs are well-formed.

对于您的函数是否接受".1"或"1e3"之类的有效数字,我没有任何偏好,但是接受它们将为您带来布朗尼积分. ;)

I don't have a preference as to whether your function accepts such things as ".1" or "1e3" as valid numbers, but accepting them would earn you brownie points. ;)

对于被零除的情况,您可能会返回"NaN"(假设您希望实现错误处理).

For divide-by-zero cases, you could perhaps return "NaN" (assuming you wish to implement error handling).

推荐答案

C(465个字符)

#define F for(i=0;P-8;i+=2)
#define V t[i
#define P V+1]
#define S V+2]),K(&L,4),i-=2)
#define L V-2]
K(double*t,int i){for(*++t=4;*t-8;*++t=V])*++t=V];}M(double*t){int i,p,b;
F if(!P)for(p=1,b=i;i+=2,p;)P?P-1||--p||(P=8,M(t+b+2),K(t+b,i-b),i=b):++p;
F P-6||(L=pow(L,S;F P-2&&P-7||(L*=(P-7?V+2]:1/S;F P-4&&(L+=(P-5?V+2]:-S;
F L=V];}E(char*s,char*r){double t[99];char*e,i=2,z=0;for(;*s;i+=2)V]=
strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;P=8;M(t+2);sprintf(r,"%g",*t);}

前五行是必需的,其余的只是为了便于阅读.我已经将前五个换行符算作一个字符.如果要以行为单位进行度量,则在删除所有空白之前需要28行,但这是一个毫无意义的数字.可能是6行到100万行,具体取决于我格式化的方式.

The first five newlines are required, the rest are there just for readability. I've counted the first five newlines as one character each. If you want to measure it in lines, it was 28 lines before I removed all the whitespace, but that's a pretty meaningless number. It could have been anything from 6 lines to a million, depending on how I formatted it.

入口是 E() (用于评估").第一个参数是输入字符串,第二个参数指向输出字符串,并且必须由调用方分配(按照常规C标准).它最多可以处理47个令牌,其中令牌可以是运算符("+-*/^()"之一)或浮点数.一元符号运算符不算作单独的令牌.

The entry point is E() (for "evaluate"). The first parameter is the input string, and the second parameter points to the output string, and must be allocated by the caller (as per usual C standards). It can handle up to 47 tokens, where a token is either an operator (one of "+-*/^()"), or a floating point number. Unary sign operators do not count as a separate token.

此代码大致基于我几年前做的一个项目.我排除了所有错误处理和空白跳动,并使用高尔夫技术对其进行了重新设计.下面是28行,其格式足以使我能够编写,但可能不足以读取.您需要#include <stdlib.h><stdio.h><math.h>(或参见底部的注释).

This code is loosely based on a project I did many years ago as an exercise. I took out all the error handling and whitespace skipping and retooled it using golf techniques. Below are the 28 lines, with enough formatting that I was able to write it, but probably not enough to read it. You'll want to #include <stdlib.h>, <stdio.h>, and <math.h> (or see note at the bottom).

请参阅代码后的工作原理说明.

See after the code for an explanation of how it works.

#define F for(i=0;P-8;i+=2)
#define V t[i
#define P V+1]
#define S V+2]),K(&L,4),i-=2)
#define L V-2]
K(double*t,int i){
    for(*++t=4;*t-8;*++t=V])
        *++t=V];
}
M(double*t){
    int i,p,b;
    F if(!P)
        for(p=1,b=i;i+=2,p;)
            P?P-1||--p||(P=8,M(t+b+2),K(t+b,i-b),i=b):++p;
    F P-6||(L=pow(L,S;
    F P-2&&P-7||(L*=(P-7?V+2]:1/S;
    F P-4&&(L+=(P-5?V+2]:-S;
    F L=V];
}
E(char*s,char*r){
    double t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V]=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    sprintf(r,"%g",*t);
}

第一步是标记化.双精度数组包含每个令牌的两个值,一个运算符(P,因为O看起来太像零)和一个值(V).这种标记化是在 E() 中的for循环中完成的.它还处理任何一元+-运算符,并将它们合并到常量中.

The first step is to tokenize. The array of doubles contains two values for each token, an operator (P, because O looks too much like zero), and a value (V). This tokenizing is what is done in the for loop in E(). It also deals with any unary + and - operators, incorporating them into the constant.

令牌数组的运算符"字段可以具有以下值之一:

The "operator" field of the token array can have one of the following values:

0 :(
1 :)
2 :*
3 :+
4 :浮点常量值
5 :-
6 :^
7 :/
8 :令牌字符串的结尾

0: (
1: )
2: *
3: +
4: a floating-point constant value
5: -
6: ^
7: /
8: end of token string

该方案主要由 Daniel Martin 派生,他注意到在此挑战中,每个操作符的ASCII表示形式的后3位都是唯一的.

This scheme was largely derived by Daniel Martin, who noticed that the last 3 bits were unique in the ASCII representation of each of the operators in this challenge.

未压缩的E()版本看起来像这样:

An uncompressed version of E() would look something like this:

void Evaluate(char *expression, char *result){
    double tokenList[99];
    char *parseEnd;
    int i = 2, prevOperator = 0;
    /* i must start at 2, because the EvalTokens will write before the
     * beginning of the array.  This is to allow overwriting an opening
     * parenthesis with the value of the subexpression. */
    for(; *expression != 0; i += 2){
        /* try to parse a constant floating-point value */
        tokenList[i] = strtod(expression, &parseEnd);

        /* explanation below code */
        if(parseEnd != expression && prevOperator != 4/*constant*/ &&
           prevOperator != 1/*close paren*/){
            expression = parseEnd;
            prevOperator = tokenList[i + 1] = 4/*constant*/;
        }else{
            /* it's an operator */
            prevOperator = tokenList[i + 1] = *expression & 7;
            expression++;
        }
    }

    /* done parsing, add end-of-token-string operator */
    tokenList[i + 1] = 8/*end*/

    /* Evaluate the expression in the token list */
    EvalTokens(tokenList + 2); /* remember the offset by 2 above? */

    sprintf(result, "%g", tokenList[0]/* result ends up in first value */);
}

由于我们保证输入有效,所以解析失败的唯一原因是因为下一个标记是运算符.如果发生这种情况,parseEnd指针将与tokenStart相同.我们还必须处理解析成功的情况,但我们真正想要的是运算符.对于加减运算符,除非直接跟随符号运算符,否则会发生这种情况.换句话说,给定表达式"4-6",我们希望将其解析为{4, -, 6},而不是{4, -6}.另一方面,给定"4+-6",我们应该将其解析为{4, +, -6}.解决方案非常简单.如果解析失败 OR ,则前面的令牌是常量或右括号(有效地是一个子表达式,其结果将为常量),则当前令牌是运算符,否则为常量.

Since we're guaranteed valid input, the only reason the parsing would fail would be because the next token is an operator. If this happens, the parseEnd pointer will be the same as, tokenStart. We must also handle the case where parsing succeeded, but what we really wanted was an operator. This would occur for the addition and subtraction operators, unless a sign operator directly followed. In other words, given the expression "4-6", we want to parse it as {4, -, 6}, and not as {4, -6}. On the other hand, given "4+-6", we should parse it as {4, +, -6}. The solution is quite simple. If parsing fails OR the preceding token was a constant or a closing parenthesis (effectively a subexpression which will evaluate to a constant), then the current token is an operator, otherwise it's a constant.

完成标记化之后,通过调用 M() 完成计算和折叠,该操作首先查找任何匹配的括号对,然后通过递归调用自身来处理其中包含的子表达式.然后它处理运算符,首先是求幂,然后是乘法和除法,最后是加法和减法.由于预期格式正确的输入(如挑战中所述),因此它不会显式检查加法运算符,因为它是处理所有其他运算符后的最后一个合法运算符.

After tokenizing is done, calculating and folding are done by calling M(), which first looks for any matched pairs of parentheses and processes the subexpressions contained within by calling itself recursively. Then it processes operators, first exponentiation, then multiplication and division together, and finally addition and subtraction together. Because well-formed input is expected (as specified in the challenge), it doesn't check for the addition operator explicitly, since it's the last legal operator after all the others are processed.

缺少高尔夫球压缩功能的计算功能如下所示:

The calculation function, lacking golf compression, would look something like this:

void EvalTokens(double *tokenList){
    int i, parenLevel, parenStart;

    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 0/*open paren*/)
            for(parenLevel = 1, parenStart = i; i += 2, parenLevel > 0){
                if(tokenList[i + 1] == 0/*another open paren*/)
                    parenLevel++;
                else if(tokenList[i + 1] == 1/*close paren*/)
                    if(--parenLevel == 0){
                        /* make this a temporary end of list */
                        tokenList[i + 1] = 8;
                        /* recursively handle the subexpression */
                        EvalTokens(tokenList + parenStart + 2);
                        /* fold the subexpression out */
                        FoldTokens(tokenList + parenStart, i - parenStart);
                        /* bring i back to where the folded value of the
                         * subexpression is now */
                        i = parenStart;
                    }
            }

    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 6/*exponentiation operator (^)*/){
            tokenList[i - 2] = pow(tokenList[i - 2], tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 2/*multiplication operator (*)*/ ||
           tokenList[i + 1] == 7/*division operator (/)*/){
            tokenList[i - 2] *=
                (tokenList[i + 1] == 2 ?
                    tokenList[i + 2] :
                    1 / tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] != 4/*constant*/){
            tokenList[i - 2] +=
                (tokenList[i + 1] == 3 ?
                    tokenList[i + 2] :
                    -tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    tokenList[-2] = tokenList[0];
    /* the compressed code does the above in a loop, equivalent to:
     *
     * for(i = 0; tokenList[i + 1] != 8; i+= 2)
     *     tokenList[i - 2] = tokenList[i];
     *
     * This loop will actually only iterate once, and thanks to the
     * liberal use of macros, is shorter. */
}

一些压缩量可能会使此功能更容易读取.

Some amount of compression would probably make this function easier to read.

一旦执行了操作,操作数和操作符就会由 K() (通过宏 S 调用)从令牌列表中折叠出来.运算结果保留为常量,而不是折叠表达式.因此,最终结果留在令牌数组的开头,因此当控件返回到 E() 时,它利用到了第一个值的事实,将其简单地打印到字符串中.数组是令牌的值字段.

Once an operation is performed, the operands and operator are folded out of the token list by K() (called through the macro S). The result of the operation is left as a constant in place of the folded expression. Consequently, the final result is left at the beginning of the token array, so when control returns to E(), it simply prints that to a string, taking advantage of the fact that the first value in the array is the value field of the token.

FoldTokens()的调用发生在执行操作(^*/+-)或子表达式之后(用括号括起来)已处理. FoldTokens()例程确保结果值具有正确的运算符类型(4),然后复制子表达式的较大表达式的其余部分.例如,当处理表达式"2+6*4+1"时,EvalTokens()首先计算6*4,将结果替换为6(2+24*4+1). FoldTokens()然后删除子表达式"24*4"的其余部分,留下2+24+1.

This call to FoldTokens() takes place either after an operation (^, *, /, +, or -) has been performed, or after a subexpression (surrounded by parentheses) has been processed. The FoldTokens() routine ensures that the result value has the correct operator type (4), and then copies the rest of the larger expression of the subexpression. For instance, when the expression "2+6*4+1" is processed, EvalTokens() first calculates 6*4, leaving the result in place of the 6 (2+24*4+1). FoldTokens() then removes the rest of the sub expression "24*4", leaving 2+24+1.

void FoldTokens(double *tokenList, int offset){
    tokenList++;
    tokenList[0] = 4; // force value to constant

    while(tokenList[0] != 8/*end of token string*/){
        tokenList[0] = tokenList[offset];
        tokenList[1] = tokenList[offset + 1];
        tokenList += 2;
    }
}

就是这样.宏只是用来替换常见的操作,而其他所有操作都只是上述操作的压缩.

That's it. The macros are just there to replace common operations, and everything else is just golf-compression of the above.

strager 坚持认为该代码应包括#include语句,因为如果没有正确地对strtodpow及其函数进行正向断言,它将无法正常运行.由于挑战仅要求一个功能而不是一个完整的程序,因此我认为这不是必需的.但是,可以通过添加以下代码来以最低的成本添加前向声明:

strager insists that the code should include #include statements, as it will not function correctly without a proper forward declation of the strtod and pow and functions. Since the challenge asks for just a function, and not a complete program, I hold that this should not be required. However, forward declarations could be added at minimal cost by adding the following code:

#define D double
D strtod(),pow();

然后我将代码中的所有"double"实例替换为"D".这将在代码中添加19个字符,从而使总数达到484.另一方面,我也可以将函数转换为返回双精度而不是字符串,就像他一样,它将修剪15个字符,从而更改 E() 功能:

I would then replace all instances of "double" in the code with "D". This would add 19 characters to the code, bringing the total up to 484. On the other hand, I could also convert my function to return a double instead of a string, as did he, which would trim 15 characters, changing the E() function to this:

D E(char*s){
    D t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V]=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    return*t;
}

这将使总代码大小为469个字符(或452,不带strtodpow的前向声明,但带有D宏).甚至可以通过要求调用者将指针传递给返回值的double来修剪另外1个字符:

This would make the total code size 469 characters (or 452 without the forward declarations of strtod and pow, but with the D macro). It would even be possible to trim 1 more characters by requiring the caller to pass in a pointer to a double for the return value:

E(char*s,D*r){
    D t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    *r=*t;
}

我将它留给读者来决定哪个版本合适.

I'll leave it to the reader to decide which version is appropriate.

这篇关于高尔夫代码:数学表达评估器(尊重PEMDAS)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆