快速评估大量输入值的数学表达式（函数） [英] Evaluating a mathematical expression (function) for a large number of input values fast

查看：98 发布时间：2020/5/31 22:47:03 python eval abstract-syntax-tree sympy numexpr

本文介绍了快速评估大量输入值的数学表达式（函数）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下问题

评估字符串中的数学表达式

在Python中进行方程式解析

从Python中不安全的用户输入中评估数学方程式

Evaluating a mathematical expression in a string
Equation parsing in Python
Safe way to parse user-supplied mathematical formula in Python
Evaluate math equations from unsafe user input in Python

和它们各自的答案使我想到如何解析一个单独的数学表达式（一般而言，类似于回答 https://stackoverflow.com/a/594 （或多或少受信任的）用户有效地从294/1672565 中获得了来自数据库的20k到30k的输入值。我实施了一个快速而肮脏的基准测试，以便可以比较不同的解决方案。

and their respective answers made me think how I could parse a single mathematical expression (in general terms along the lines of this answer https://stackoverflow.com/a/594294/1672565) given by a (more or less trusted) user efficiently for 20k to 30k input values coming from a database. I implemented a quick and dirty benchmark so I could compare different solutions.

# Runs with Python 3(.4)
import pprint
import time

# This is what I have
userinput_function = '5*(1-(x*0.1))' # String - numbers should be handled as floats
demo_len = 20000 # Parameter for benchmark (20k to 30k in real life)
print_results = False

# Some database, represented by an array of dicts (simplified for this example)

database_xy = []
for a in range(1, demo_len, 1):
    database_xy.append({
        'x':float(a),
        'y_eval':0,
        'y_sympya':0,
        'y_sympyb':0,
        'y_sympyc':0,
        'y_aevala':0,
        'y_aevalb':0,
        'y_aevalc':0,
        'y_numexpr': 0,
        'y_simpleeval':0
        })

＃解决方案1：评估[是的，完全不安全]

time_start = time.time()
func = eval("lambda x: " + userinput_function)
for item in database_xy:
    item['y_eval'] = func(item['x'])
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('1 eval: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案＃2a：sympy-evalf（ http://www.sympy.org ）

# Solution #2a: sympy - evalf (http://www.sympy.org)

import sympy
time_start = time.time()
x = sympy.symbols('x')
sympy_function = sympy.sympify(userinput_function)
for item in database_xy:
    item['y_sympya'] = float(sympy_function.evalf(subs={x:item['x']}))
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('2a sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案＃2b：sympy-lambdify（ http://www.sympy.org ）

# Solution #2b: sympy - lambdify (http://www.sympy.org)

from sympy.utilities.lambdify import lambdify
import sympy
import numpy
time_start = time.time()
sympy_functionb = sympy.sympify(userinput_function)
func = lambdify(x, sympy_functionb, 'numpy') # returns a numpy-ready function
xx = numpy.zeros(len(database_xy))
for index, item in enumerate(database_xy):
    xx[index] = item['x']
yy = func(xx)
for index, item in enumerate(database_xy):
    item['y_sympyb'] = yy[index]
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('2b sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案＃2c：sympy-使用numexpr [和numpy] lambdify（ http://www.sympy.org ）

# Solution #2c: sympy - lambdify with numexpr [and numpy] (http://www.sympy.org)

from sympy.utilities.lambdify import lambdify
import sympy
import numpy
import numexpr
time_start = time.time()
sympy_functionb = sympy.sympify(userinput_function)
func = lambdify(x, sympy_functionb, 'numexpr') # returns a numpy-ready function
xx = numpy.zeros(len(database_xy))
for index, item in enumerate(database_xy):
    xx[index] = item['x']
yy = func(xx)
for index, item in enumerate(database_xy):
    item['y_sympyc'] = yy[index]
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('2c sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案＃3a：[基于ast]的asteval-带字符串魔术（ http://newville.github.io/asteval/index.html ）

# Solution #3a: asteval [based on ast] - with string magic (http://newville.github.io/asteval/index.html)

from asteval import Interpreter
aevala = Interpreter()
time_start = time.time()
aevala('def func(x):\n\treturn ' + userinput_function)
for item in database_xy:
    item['y_aevala'] = aevala('func(' + str(item['x']) + ')')
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('3a aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案＃3b（M纽维尔）：asteval [基于ast]-解析和运行（ http://newville.github.io/asteval/index.html ）

# Solution #3b (M Newville): asteval [based on ast] - parse & run (http://newville.github.io/asteval/index.html)

from asteval import Interpreter
aevalb = Interpreter()
time_start = time.time()
exprb = aevalb.parse(userinput_function)
for item in database_xy:
    aevalb.symtable['x'] = item['x']
    item['y_aevalb'] = aevalb.run(exprb)
time_end = time.time()
print('3b aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案3c（M Newville）：asteval [基于ast]-解析&使用numpy（ http://newville.github.io/asteval/index.html）

# Solution #3c (M Newville): asteval [based on ast] - parse & run with numpy (http://newville.github.io/asteval/index.html)

from asteval import Interpreter
import numpy
aevalc = Interpreter()
time_start = time.time()
exprc = aevalc.parse(userinput_function)
x = numpy.array([item['x'] for item in database_xy])
aevalc.symtable['x'] = x
y = aevalc.run(exprc)
for index, item in enumerate(database_xy):
    item['y_aevalc'] = y[index]
time_end = time.time()
print('3c aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案＃4：simpleeval [基于ast]（ https://github.com/danthedeckie/simpleeval ）

# Solution #4: simpleeval [based on ast] (https://github.com/danthedeckie/simpleeval)

from simpleeval import simple_eval
time_start = time.time()
for item in database_xy:
    item['y_simpleeval'] = simple_eval(userinput_function, names={'x': item['x']})
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('4 simpleeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

＃解决方案＃5 numexpr [和numpy]（ https：// github .com / pydata / numexpr ）

# Solution #5 numexpr [and numpy] (https://github.com/pydata/numexpr)

import numpy
import numexpr
time_start = time.time()
x = numpy.zeros(len(database_xy))
for index, item in enumerate(database_xy):
    x[index] = item['x']
y = numexpr.evaluate(userinput_function)
for index, item in enumerate(database_xy):
    item['y_numexpr'] = y[index]
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('5 numexpr: ' + str(round(time_end - time_start, 4)) + ' seconds')

在我的旧测试机上（Pyt hon 3.4，Linux 3.11 x86_64，两个内核，1.8GHz）我得到以下结果：

On my old test machine (Python 3.4, Linux 3.11 x86_64, two cores, 1.8GHz) I get the following results:

1 eval: 0.0185 seconds
2a sympy: 10.671 seconds
2b sympy: 0.0315 seconds
2c sympy: 0.0348 seconds
3a aeval: 2.8368 seconds
3b aeval: 0.5827 seconds
3c aeval: 0.0246 seconds
4 simpleeval: 1.2363 seconds
5 numexpr: 0.0312 seconds

什么突出的是 eval 的惊人速度，尽管我不想在现实生活中使用它。第二好的解决方案似乎是 numexpr ，它依赖于 numpy -我想避免的依赖关系，尽管这不是硬性要求。接下来的最好的事情是 simpleeval ，它是围绕 ast 构建的。另一个基于ast的解决方案 aeval 遭受这样一个事实，即我必须首先将每个浮点输入值都转换为字符串，而我却找不到这种方法。 sympy 最初是我的最爱，因为它提供了最灵活，看似最安全的解决方案，但最终却以倒数第二个解决方案留下了令人印象深刻的距离。

What sticks out is the incredible speed of eval, though I do not want to use this in real life. The second best solution seems to be numexpr, which depends on numpy - a dependency I would like to avoid, although this is not a hard requirement. The next best thing is simpleeval, which is build around ast. aeval, another ast-based solution, suffers from the fact that I have to convert every single float input value into a string first, around which I could not find a way. sympy was initially my favorite because it offers the most flexible and apparently safest solution, but it ended up being last with some impressive distance to the second to last solution.

更新1 ：使用 sympy 的方法要快得多。请参阅解决方案2b。它几乎和 numexpr 一样好，尽管我不确定 sympy 是否在内部使用。

Update 1: There is a much faster approach using sympy. See solution 2b. It is almost as good as numexpr, though I am not sure whether sympy is actually using it internally.

更新2 ： sympy 实现现在使用 sympify 而不是 simplify （由其主要开发人员asmeurer建议-谢谢）。除非明确要求，否则它不使用 numexpr （请参阅解决方案2c）。我还基于 asteval 添加了两个明显更快的解决方案（感谢M Newville）。

Update 2: The sympy implementations now use sympify instead of simplify (as recommended by its lead developer, asmeurer - thanks). It is not using numexpr unless it is explicitly asked to do so (see solution 2c). I also added two significantly faster solutions based on asteval (thanks to M Newville).

我必须采取哪些选择措施来进一步加快任何相对安全的解决方案的速度？例如，是否还有其他安全的方法直接使用ast？

What options do I have to speed any of the relatively safer solutions up even further? Are there other, safe(-ish) approaches using ast directly for instance?

快速评估大量输入值的数学表达式（函数） [英] Evaluating a mathematical expression (function) for a large number of input values fast

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

快速评估大量输入值的数学表达式（函数） [英] Evaluating a mathematical expression (function) for a large number of input values fast

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭