快速评估大量输入值的数学表达式(函数) [英] Evaluating a mathematical expression (function) for a large number of input values fast

查看:26
本文介绍了快速评估大量输入值的数学表达式(函数)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下问题

和他们各自的答案让我思考如何解析单个数学表达式(一般来说,按照这个答案https://stackoverflow.com/a/594294/1672565) 由(或多或少受信任的)用户有效提供,用于来自数据库的 20k 到 30k 输入值.我实施了一个快速而肮脏的基准测试,以便比较不同的解决方案.

and their respective answers made me think how I could parse a single mathematical expression (in general terms along the lines of this answer https://stackoverflow.com/a/594294/1672565) given by a (more or less trusted) user efficiently for 20k to 30k input values coming from a database. I implemented a quick and dirty benchmark so I could compare different solutions.

# Runs with Python 3(.4)
import pprint
import time

# This is what I have
userinput_function = '5*(1-(x*0.1))' # String - numbers should be handled as floats
demo_len = 20000 # Parameter for benchmark (20k to 30k in real life)
print_results = False

# Some database, represented by an array of dicts (simplified for this example)

database_xy = []
for a in range(1, demo_len, 1):
    database_xy.append({
        'x':float(a),
        'y_eval':0,
        'y_sympya':0,
        'y_sympyb':0,
        'y_sympyc':0,
        'y_aevala':0,
        'y_aevalb':0,
        'y_aevalc':0,
        'y_numexpr': 0,
        'y_simpleeval':0
        })

# 解决方案 #1:eval [是的,完全不安全]

time_start = time.time()
func = eval("lambda x: " + userinput_function)
for item in database_xy:
    item['y_eval'] = func(item['x'])
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('1 eval: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #2a:sympy - evalf (http://www.sympy.org)

import sympy
time_start = time.time()
x = sympy.symbols('x')
sympy_function = sympy.sympify(userinput_function)
for item in database_xy:
    item['y_sympya'] = float(sympy_function.evalf(subs={x:item['x']}))
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('2a sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #2b:sympy - lamdify (http://www.sympy.org)

from sympy.utilities.lambdify import lambdify
import sympy
import numpy
time_start = time.time()
sympy_functionb = sympy.sympify(userinput_function)
func = lambdify(x, sympy_functionb, 'numpy') # returns a numpy-ready function
xx = numpy.zeros(len(database_xy))
for index, item in enumerate(database_xy):
    xx[index] = item['x']
yy = func(xx)
for index, item in enumerate(database_xy):
    item['y_sympyb'] = yy[index]
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('2b sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #2c:sympy - 使用 numexpr [和 numpy] (http://www.sympy.组织)

from sympy.utilities.lambdify import lambdify
import sympy
import numpy
import numexpr
time_start = time.time()
sympy_functionb = sympy.sympify(userinput_function)
func = lambdify(x, sympy_functionb, 'numexpr') # returns a numpy-ready function
xx = numpy.zeros(len(database_xy))
for index, item in enumerate(database_xy):
    xx[index] = item['x']
yy = func(xx)
for index, item in enumerate(database_xy):
    item['y_sympyc'] = yy[index]
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('2c sympy: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #3a:asteval [基于 ast] - 使用字符串魔法 (http://newville.github.io/asteval/index.html)

from asteval import Interpreter
aevala = Interpreter()
time_start = time.time()
aevala('def func(x):
	return ' + userinput_function)
for item in database_xy:
    item['y_aevala'] = aevala('func(' + str(item['x']) + ')')
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('3a aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #3b (M Newville):asteval [基于 ast] - 解析 &运行 (http://newville.github.io/asteval/index.html)

from asteval import Interpreter
aevalb = Interpreter()
time_start = time.time()
exprb = aevalb.parse(userinput_function)
for item in database_xy:
    aevalb.symtable['x'] = item['x']
    item['y_aevalb'] = aevalb.run(exprb)
time_end = time.time()
print('3b aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #3c (M Newville):asteval [基于 ast] - 解析 &使用 numpy 运行 (http://newville.github.io/asteval/index.html)

from asteval import Interpreter
import numpy
aevalc = Interpreter()
time_start = time.time()
exprc = aevalc.parse(userinput_function)
x = numpy.array([item['x'] for item in database_xy])
aevalc.symtable['x'] = x
y = aevalc.run(exprc)
for index, item in enumerate(database_xy):
    item['y_aevalc'] = y[index]
time_end = time.time()
print('3c aeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #4: simpleeval [基于 ast] (https://github.com/danthedeckie/simpleeval)

from simpleeval import simple_eval
time_start = time.time()
for item in database_xy:
    item['y_simpleeval'] = simple_eval(userinput_function, names={'x': item['x']})
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('4 simpleeval: ' + str(round(time_end - time_start, 4)) + ' seconds')

# 解决方案 #5 numexpr [和 numpy] (https://github.com/pydata/numexpr)

import numpy
import numexpr
time_start = time.time()
x = numpy.zeros(len(database_xy))
for index, item in enumerate(database_xy):
    x[index] = item['x']
y = numexpr.evaluate(userinput_function)
for index, item in enumerate(database_xy):
    item['y_numexpr'] = y[index]
time_end = time.time()
if print_results:
    pprint.pprint(database_xy)
print('5 numexpr: ' + str(round(time_end - time_start, 4)) + ' seconds')

在我的旧测试机(Python 3.4,Linux 3.11 x86_64,两个内核,1.8GHz)上,我得到以下结果:

On my old test machine (Python 3.4, Linux 3.11 x86_64, two cores, 1.8GHz) I get the following results:

1 eval: 0.0185 seconds
2a sympy: 10.671 seconds
2b sympy: 0.0315 seconds
2c sympy: 0.0348 seconds
3a aeval: 2.8368 seconds
3b aeval: 0.5827 seconds
3c aeval: 0.0246 seconds
4 simpleeval: 1.2363 seconds
5 numexpr: 0.0312 seconds

突出的是 eval 令人难以置信的速度,尽管我不想在现实生活中使用它.第二个最好的解决方案似乎是 numexpr,它依赖于 numpy - 我想避免这种依赖,尽管这不是一个硬性要求.其次是 simpleeval,它是围绕 ast 构建的.aeval 是另一个基于 ast 的解决方案,它的问题是我必须首先将每个浮点输入值转换为字符串,但我找不到解决方法.sympy 最初是我最喜欢的,因为它提供了最灵活、最安全的解决方案,但它最终排在最后,与倒数第二个解决方案的距离令人印象深刻.

What sticks out is the incredible speed of eval, though I do not want to use this in real life. The second best solution seems to be numexpr, which depends on numpy - a dependency I would like to avoid, although this is not a hard requirement. The next best thing is simpleeval, which is build around ast. aeval, another ast-based solution, suffers from the fact that I have to convert every single float input value into a string first, around which I could not find a way. sympy was initially my favorite because it offers the most flexible and apparently safest solution, but it ended up being last with some impressive distance to the second to last solution.

更新 1:使用 sympy 有一种更快的方法.参见解决方案 2b.它几乎和 numexpr 一样好,虽然我不确定 sympy 是否真的在内部使用它.

Update 1: There is a much faster approach using sympy. See solution 2b. It is almost as good as numexpr, though I am not sure whether sympy is actually using it internally.

更新 2:sympy 实现现在使用 sympify 而不是 simplify(按照其主要开发人员的建议),asmeurer - 谢谢).它不使用 numexpr 除非明确要求这样做(请参阅解决方案 2c).我还基于 asteval 添加了两个明显更快的解决方案(感谢 M Newville).

Update 2: The sympy implementations now use sympify instead of simplify (as recommended by its lead developer, asmeurer - thanks). It is not using numexpr unless it is explicitly asked to do so (see solution 2c). I also added two significantly faster solutions based on asteval (thanks to M Newville).

我有什么选择可以进一步加快任何相对安全的解决方案的速度?例如,还有其他直接使用 ast 的安全(-ish)方法吗?

What options do I have to speed any of the relatively safer solutions up even further? Are there other, safe(-ish) approaches using ast directly for instance?

推荐答案

既然您询问了 asteval, 有一种方法可以使用它并获得更快的结果:

Since you asked about asteval, there is a way to use it and get faster results:

aeval = Interpreter()
time_start = time.time()
expr = aeval.parse(userinput_function)
for item in database_xy:
    aeval.symtable['x'] = item['x']
    item['y_aeval'] = aeval.run(expr)
time_end = time.time()

即可以先解析(预编译")用户输入的函数,然后将x的每个新值插入到符号表中并使用Interpreter.run() 来评估该值的编译表达式.根据您的规模,我认为这将使您接近 0.5 秒.

That is, you can first parse ("pre-compile") the user input function, and then insert each new value of x into the symbol table and the use Interpreter.run() to evaluate the compiled expression for that value. On your scale, I think this will get you close to 0.5 seconds.

如果您愿意使用 numpy,混合解决方案:

If you are willing to use numpy, a hybrid solution:

aeval = Interpreter()
time_start = time.time()
expr = aeval.parse(userinput_function)
x = numpy.array([item['x'] for item in database_xy])
aeval.symtable['x'] = x
y = aeval.run(expr)
time_end = time.time()

应该更快,并且在运行时间上与使用 numexpr 相当.

should be much faster, and comparable in run time to using numexpr.

这篇关于快速评估大量输入值的数学表达式(函数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆