在系数之间添加乘法符号(*) [英] Add multiplication signs (*) between coefficients

查看:40
本文介绍了在系数之间添加乘法符号(*)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个程序,用户可以在其中输入诸如 sin(x)+1 之类的功能.我正在使用 ast 尝试通过将组件列入白名单来确定字符串是否安全",如此答案.现在,我想解析字符串以在没有系数的系数之间添加乘号( * ).

例如:

  • 3x -> 3 * x
  • 4(x + 5)-> 4 *(x + 5)
  • sin(3x)(4)-> sin(3x)*(4)( sin 已经存在于全局变量中,否则将会是 s * i * n *(3x)*(4)

有没有有效的算法可以做到这一点?我更喜欢pythonic解决方案(即,不是复杂的正则表达式,不是因为它们是pythonic,而是因为我也不太理解它们,并且想要我可以理解的解决方案.简单的正则表达式也可以.)

我非常愿意在一种情况下使用 sympy (对于这种事情,这看起来真的很容易):安全.显然, sympy 在后台使用了 eval .我目前(部分)的解决方案具有很好的安全性.如果有人可以通过不受信任的输入来使 sympy 更安全,我也将对此表示欢迎.

解决方案

正则表达式很容易是用香草python完成工作的最快,最干净的方法,我什至会为您解释该正则表达式,因为正则表达式是如此一个强大的工具,很容易理解.

要实现您的目标,请使用以下语句:

  import re#<代码在这里,将'thefunction'变量设置为您要解析的字符串>re.sub(r(((?:\ d +)|(?:[a-zA-Z] \ w * \(\ w + \)))(((?:[a-zA-Z] \ w *)| \(),r" \ 1 * \ 2,函数) 

我知道它有点长而且很复杂,但是如果没有比这里的正则表达式更复杂的东西,一个不同,更简单的解决方案就不会立即使自己变得显而易见.但是,这已经针对您的所有三个测试用例进行了测试,并且可以按照您的要求精确地工作.

作为这里发生情况的简要说明: re.sub 的第一个参数是正则表达式,它匹配特定的模式.第二个是我们要替换的东西,第三个是用来替换东西的实际字符串.每当我们的正则表达式看到匹配项时,它都会将其删除并插入替换,并带有一些特殊的幕后技巧.

对正则表达式的更深入分析如下:

  • (((?:\ d +)|(?:[a-zA-Z] \ w * \(\ w + \))))((?? [[a-zA-Z] \ w *)| \():匹配数字或函数调用,后跟变量或括号.
    • ((?:\ d +)|(?:[a-zA-Z] \ w * \(\ w + \))):第1组.注意:括号定界了一个组,它是一个子正则表达式.捕获组被索引以供将来参考;组也可以使用修饰符重复(稍后说明).该组匹配一个数字或一个函数调用.
      • (?:\ d +):非捕获组.在右括号后面紧跟?:的任何组都不会为其本身分配索引,但仍充当模式的部分".前任. A(?:bc)+ 将匹配"Abcbcbcbc ...",依此类推,但是您无法使用索引访问"bcbcbcbc"匹配项.但是,如果没有该组,则写"Abc +"将与"Abcccccccc ..."匹配
        • \ d :一次匹配任何数字. \ d 本身的正则表达式将分别匹配"1" "2" "3" "123" .
        • + :匹配上一个元素一次或多次.在这种情况下,前一个元素是 \ d ,可以是任意数字.在上一个示例中,"123"上的 \ d + 将成功匹配"123"作为单个元素.这对于我们的正则表达式至关重要,以确保正确注册多位数字.
      • | :竖线字符,在正则表达式中有效地表示 or :"a | b" 将匹配"a"" b".在这种情况下,它将数字"和函数调用"分开;匹配数字或函数调用.
      • (?:[a-zA-Z] \ w * \(\ w + \)):匹配一个函数调用.也是非捕获组,例如(?:\ d +).
        • [a-zA-Z] :匹配函数调用的第一个字母.对此没有修饰符,因为我们只需要确保 first 字符是字母即可; A123 从技术上讲是有效的函数名称.
        • \ w :匹配任何字母数字字符或下划线.确保第一个字母后,随后的字符可以是字母,数字或下划线,并且仍然可以用作函数名.
        • * :匹配上一个元素 0次或更多次.虽然最初看起来不必要,但星形字符有效地使元素可选.在这种情况下,我们的修改后的元素是 \ w ,但是一个函数在技术上不需要一个以上的字符; A()是有效的函数名称. A 将与 [a-zA-Z] 匹配,从而无需 \ w .在频谱的另一端,跟在第一个字母之后可以有任意数量的字符,这就是我们需要此修饰符的原因.
        • \(:了解这一点很重要:这不是另一个组.这里的反斜杠的行为很像普通字符串中的转义符.,每当您在特殊字符(例如括号, + * )前加反斜杠时,都会像普通字符一样使用它. \(匹配左括号,用于该函数的实际函数调用部分.
        • \ w + :一次或多次匹配数字,字母或下划线.这样可以确保该函数实际上包含一个参数.
        • \):类似于 \(,但与右下角括号匹配
    • (((?:[a-zA-Z] \ w *)| \():第2组.匹配变量或左括号.
      • (?:[a-zA-Z] \ w *):匹配一个变量.这与我们的函数名称匹配器完全相同.但是,请注意,这属于非捕获组:这很重要,因为OR检查的方式.紧随其后的OR将从整体上看待该组.如果未将其分组,则匹配的最后一个对象"将为 \ w * ,不足以满足我们的需求.它会说:匹配一个字母后跟更多字母,或者匹配一个字母后跟括号".将此元素放在一个非捕获组中,可以控制OR寄存器的内容.
      • | :或字符.匹配(?:[a-zA-Z] \ w *) \(.
      • \(:匹配一个左括号.一旦我们检查了是否有一个左括号,就不需要为正则表达式检查任何超出它的东西.

现在,还记得我们的两个小组,第一小组和第二小组吗?这些用于替换字符串"\ 1 * \ 2" .替换字符串不是真正的正则表达式,但是它仍然具有某些特殊字符.在这种情况下, \< number> 将插入该编号的组.所以我们的替换字符串说:将组1放入(这是我们的函数调用或我们的数字),然后放入星号(*),然后放入我们的第二组(变量或括号)"

我想总结一下!

I have a program in which a user inputs a function, such as sin(x)+1. I'm using ast to try to determine if the string is 'safe' by whitelisting components as shown in this answer. Now I'd like to parse the string to add multiplication (*) signs between coefficients without them.

For example:

  • 3x-> 3*x
  • 4(x+5) -> 4*(x+5)
  • sin(3x)(4) -> sin(3x)*(4) (sin is already in globals, otherwise this would be s*i*n*(3x)*(4)

Are there any efficient algorithms to accomplish this? I'd prefer a pythonic solution (i.e. not complex regexes, not because they're pythonic, but just because I don't understand them as well and want a solution I can understand. Simple regexes are ok. )

I'm very open to using sympy (which looks really easy for this sort of thing) under one condition: safety. Apparently sympy uses eval under the hood. I've got pretty good safety with my current (partial) solution. If anyone has a way to make sympy safer with untrusted input, I'd welcome this too.

解决方案

A regex is easily the quickest and cleanest way to get the job done in vanilla python, and I'll even explain the regex for you, because regexes are such a powerful tool it's nice to understand.

To accomplish your goal, use the following statement:

import re
# <code goes here, set 'thefunction' variable to be the string you're parsing>
re.sub(r"((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\()", r"\1*\2", thefunction)

I know it's a bit long and complicated, but a different, simpler solution doesn't make itself immediately obvious without even more hacky stuff than what's gone into the regex here. But, this has been tested against all three of your test cases and works out precisely as you want.

As a brief explanation of what's going on here: The first parameter to re.sub is the regular expression, which matches a certain pattern. The second is the thing we're replacing it with, and the third is the actual string to replace things in. Every time our regex sees a match, it removes it and plugs in the substitution, with some special behind-the-scenes tricks.

A more in-depth analysis of the regex follows:

  • ((?:\d+)|(?:[a-zA-Z]\w*\(\w+\)))((?:[a-zA-Z]\w*)|\() : Matches a number or a function call, followed by a variable or parentheses.
    • ((?:\d+)|(?:[a-zA-Z]\w*\(\w+\))) : Group 1. Note: Parentheses delimit a Group, which is sort of a sub-regex. Capturing groups are indexed for future reference; groups can also be repeated with modifiers (described later). This group matches a number or a function call.
      • (?:\d+) : Non-capturing group. Any group with ?: immediately after the opening parenthesis will not assign an index to itself, but still act as a "section" of the pattern. Ex. A(?:bc)+ will match "Abcbcbcbc..." and so on, but you cannot access the "bcbcbcbc" match with an index. However, without this group, writing "Abc+" would match "Abcccccccc..."
        • \d : Matches any numerical digit once. A regex of \d all its own will match, separately, "1", "2", and "3" of "123".
        • + : Matches the previous element one or more times. In this case, the previous element is \d, any number. In the previous example, \d+ on "123" will successfully match "123" as a single element. This is vital to our regex, to make sure that multi-digit numbers are properly registered.
      • | : Pipe character, and in a regex, it effectively says or: "a|b" will match "a" OR "b". In this case, it separates "a number" and "a function call"; match a number OR a function call.
      • (?:[a-zA-Z]\w*\(\w+\)) : Matches a function call. Also a non-capturing group, like (?:\d+).
        • [a-zA-Z] : Matches the first letter of the function call. There is no modifier on this because we only need to ensure the first character is a letter; A123 is technically a valid function name.
        • \w : Matches any alphanumeric character or an underscore. After the first letter is ensured, the following characters could be letters, numbers, or underscores and still be valid as a function name.
        • * : Matches the previous element 0 or more times. While initially seeming unnecessary, the star character effectively makes an element optional. In this case, our modified element is \w, but a function doesn't technically need any more than one character; A() is a valid function name. A would be matched by [a-zA-Z], making \w unnecessary. On the other end of the spectrum, there could be any number of characters following the first letter, which is why we need this modifier.
        • \( : This is important to understand: this is not another group. The backslash here acts much like an escape character would in a normal string. In a regex, any time you preface a special character, such as parentheses, +, or * with a backslash, it uses it like a normal character. \( matches an opening parenthesis, for the actual function call part of the function.
        • \w+ : Matches a number, letter or underscore one or more times. This ensures the function actually has a parameter going into it.
        • \) : Like \(, but matches a closing parenthesis
    • ((?:[a-zA-Z]\w*)|\() : Group 2. Matches a variable, or an opening parenthesis.
      • (?:[a-zA-Z]\w*) : Matches a variable. This is the exact same as our function name matcher. However, note that this is in a non-capturing group: this is important, because of the way the OR checks. The OR immediately following this looks at this group as a whole. If this was not grouped, the "last object matched" would be \w*, which would not be sufficient for what we want. It would say: "match one letter followed by more letters OR one letter followed by a parenthesis". Putting this element in a non-capturing group allows us to control what the OR registers.
      • | : Or character. Matches (?:[a-zA-Z]\w*) or \(.
      • \( : Matches an opening parenthesis. Once we have checked if there is an opening parenthesis, we don't need to check anything beyond it for the purposes of our regex.

Now, remember our two groups, group one and group two? These are used in the substitution string, "\1*\2". The substitution string is not a true regex, but it still has certain special characters. In this case, \<number> will insert the group of that number. So our substitution string is saying: "Put group 1 in (which is either our function call or our number), then put in an asterisk (*), then put in our second group (either a variable or a parenthesis)"

I think that about sums it up!

这篇关于在系数之间添加乘法符号(*)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆