性能的大规模评价表达式在IronPython [英] Performance of Mass-Evaluating Expressions in IronPython

查看:275
本文介绍了性能的大规模评价表达式在IronPython的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在C#-4.0应用程序中,我有一个具有相同长度的强类型IList的字典 - 一个动态强类型的基于列的表。
我想用户提供一个或多个(python-)表达式基于可用的列,将聚合在所有行。在静态上下文中是:

In an C#-4.0 application, I have a Dictionary of strongly typed ILists having the same length - a dynamically strongly typed column based table. I want the user to provide one or more (python-)expressions based on the available columns that will be aggregated over all rows. In a static context it would be:

IDictionary<string, IList> table;
// ...
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
    sum += (double)a[i] / b[i]; // Expression to sum up

对于n = 10 ^ 7,我的笔记本电脑win7 x64)。使用两个int参数替换具有委托的表达式,非类型代理1.19秒需要0.580秒。
使用

For n = 10^7 this runs in 0.270 sec on my laptop (win7 x64). Replacing the expression by a delegate with two int arguments it takes 0.580 sec, for a nontyped delegate 1.19 sec. Creating the delegate from IronPython with

IDictionary<string, IList> table;
// ...
var options = new Dictionary<string, object>();
options["DivisionOptions"] = PythonDivisionOptions.New;
var engine = Python.CreateEngine(options);
string expr = "a / b";
Func<int, int, double> f = engine.Execute("lambda a, b : " + expr);

IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
    sum += f(a[i], b[i]);

需要3.2秒(以及5.1秒, Func ) - 因子4到5.5。这是我在做什么的预期开销?可以改进什么?

it takes 3.2 sec (and 5.1 sec with Func<object, object, object>) - factor 4 to 5.5. Is this the expected overhead for what I'm doing? What could be improved?

如果我有很多列,上面选择的方法将不再足够了。一个解决方案是确定每个表达式的必需列,并仅使用那些作为参数。我没有成功尝试的另一个解决方案是使用ScriptScope并动态解析列。为此,我定义了一个RowIterator,它具有活动行的RowIndex和每个列的属性。

If I have many columns, the approach chosen above will not be sufficient any more. One solution could be to determine the required columns for each expression and use only those as arguments. The other solution I've unsuccessfully tried was using a ScriptScope and dynamically resolve the columns. For that I defined a RowIterator that has a RowIndex for the active row and a property for each column.

class RowIterator
{
    IList<int> la;
    IList<int> lb;

    public RowIterator(IList<int> a, IList<int> b)
    {
        this.la = a;
        this.lb = b;
    }
    public int RowIndex { get; set; }

    public int a { get { return la[RowIndex]; } }
    public int b { get { return lb[RowIndex]; } }
}

ScriptScope可以从IDynamicMetaObjectProvider创建,我希望由C#的动态实现 - 但是在运行时engine.CreateScope(IDictionary)试图被调用,失败。

A ScriptScope can be created from a IDynamicMetaObjectProvider, which I expected to be implemented by C#'s dynamic - but at runtime engine.CreateScope(IDictionary) is trying to be called, which fails.

dynamic iterator = new RowIterator(a, b) as dynamic;
var scope = engine.CreateScope(iterator);
var expr = engine.CreateScriptSourceFromString("a / b").Compile();

double sum = 0;
for (int i = 0; i < n; i++)
{
    iterator.Index = i;
    sum += expr.Execute<double>(scope);
}



接下来我试图让RowIterator从DynamicObject继承, - 可怕的性能:158秒。

Next I tried to let RowIterator inherit from DynamicObject and made it to a running example - with terrible performance: 158 sec.

class DynamicRowIterator : DynamicObject
{
    Dictionary<string, object> members = new Dictionary<string, object>();
    IList<int> la;
    IList<int> lb;

    public DynamicRowIterator(IList<int> a, IList<int> b)
    {
        this.la = a;
        this.lb = b;
    }

    public int RowIndex { get; set; }
    public int a { get { return la[RowIndex]; } }
    public int b { get { return lb[RowIndex]; } }

    public override bool TryGetMember(GetMemberBinder binder, out object result)
    {
        if (binder.Name == "a") // Why does this happen?
        {
            result = this.a;
            return true;
        }
        if (binder.Name == "b")
        {
            result = this.b;
            return true;
        }
        if (base.TryGetMember(binder, out result))
            return true;
        if (members.TryGetValue(binder.Name, out result))
            return true;
        return false;
    }

    public override bool TrySetMember(SetMemberBinder binder, object value)
    {
        if (base.TrySetMember(binder, value))
            return true;
        members[binder.Name] = value;
        return true;
    }
}



我很惊讶TryGetMember调用名称属性。从文档中,我预计TryGetMember只会被调用未定义的属性。

I was surprised that TryGetMember is called with the name of the properties. From the documentation I would have expected that TryGetMember would only be called for undefined properties.

我可能需要为我的RowIterator实现IDynamicMetaObjectProvider以使用动态CallSites,但是找不到一个合适的例子让我开始。在我的实验中,我不知道如何处理BindGetMember中的 __ builtins __

Probably for a sensible performance I would need to implement IDynamicMetaObjectProvider for my RowIterator to make use of dynamic CallSites, but couldn't find a suited example for me to start with. In my experiments I didn't know how to handle __builtins__ in BindGetMember:

class Iterator : IDynamicMetaObjectProvider
{
    IList<int> la;
    IList<int> lb;

    public Iterator(IList<int> a, IList<int> b)
    {
        this.la = a;
        this.lb = b;
    }
    public int RowIndex { get; set; }
    public int a { get { return la[RowIndex]; } }
    public int b { get { return lb[RowIndex]; } }

    public DynamicMetaObject GetMetaObject(Expression parameter)
    {
        return new MetaObject(parameter, this);
    }

    private class MetaObject : DynamicMetaObject
    {
        internal MetaObject(Expression parameter, Iterator self)
             : base(parameter, BindingRestrictions.Empty, self) { }

        public override DynamicMetaObject BindGetMember(GetMemberBinder binder)
        {
            switch (binder.Name)
            {
                case "a":
                case "b":
                    Type type = typeof(Iterator);
                    string methodName = binder.Name;
                    Expression[] parameters = new Expression[]
                    {
                        Expression.Constant(binder.Name)
                    };
                    return new DynamicMetaObject(
                        Expression.Call(
                            Expression.Convert(Expression, LimitType),
                            type.GetMethod(methodName),
                            parameters),
                        BindingRestrictions.GetTypeRestriction(Expression, LimitType));
                default:
                    return base.BindGetMember(binder);
            }
        }
    }
}



'确保我上面的代码是次优的,至少它不处理列的IDictionary。

I'm sure my code above is suboptimal, at least it doesn't handle the IDictionary of columns yet. I would be grateful for any advices on how to improve design and/or performance.

推荐答案

我还比较了IronPython的性能和性能,一个C#实现。表达式很简单,只是在指定的索引处添加两个数组的值。访问数组直接提供了基线和理论最佳值。通过符号字典访问值仍具有可接受的性能。

I also compared the performance of IronPython against a C# implementation. The expression is simple, just adding the values of two arrays at a specified index. Accessing the arrays directly provides the base line and theoretical optimum. Accessing the values via a symbol dictionary has still acceptable performance.

第三个测试从一个初始(和坏的通过intend)表达式树创建一个委托,没有任何奇怪的东西,如call的缓存,但它仍然比IronPython更快。

The third test creates a delegate from a naive (and bad by intend) expression tree without any fancy stuff like call-side caching, but it's still faster than IronPython.

通过IronPython脚本化表达式花费的时间最多。我的分析器告诉我,大多数时间花在PythonOps.GetVariable,PythonDictionary.TryGetValue和PythonOps.TryGetBoundAttr。

Scripting the expression via IronPython takes the most time. My profiler shows me that most time is spent in PythonOps.GetVariable, PythonDictionary.TryGetValue and PythonOps.TryGetBoundAttr. I think there's room for improvement.

时间:


  • 直接:00: 00:00.0052680

  • 通过字典:00:00:00.5577922

  • 编译代表:00:00:03.2733377

  • Scripted:00:00:09.0485515

  • Direct: 00:00:00.0052680
  • via Dictionary: 00:00:00.5577922
  • Compiled Delegate: 00:00:03.2733377
  • Scripted: 00:00:09.0485515

以下是代码:

   public static void PythonBenchmark()
    {
        var engine = Python.CreateEngine();

        int iterations = 1000;
        int count = 10000;

        int[] a = Enumerable.Range(0, count).ToArray();
        int[] b = Enumerable.Range(0, count).ToArray();

        Dictionary<string, object> symbols = new Dictionary<string, object> { { "a", a }, { "b", b } };

        Func<int, object> calculate = engine.Execute("lambda i: a[i] + b[i]", engine.CreateScope(symbols));

        var sw = Stopwatch.StartNew();

        int sum = 0;

        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += a[i] + b[i];
            }
        }

        Console.WriteLine("Direct: " + sw.Elapsed);



        sw.Restart();
        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += ((int[])symbols["a"])[i] + ((int[])symbols["b"])[i];
            }
        }

        Console.WriteLine("via Dictionary: " + sw.Elapsed);



        var indexExpression = Expression.Parameter(typeof(int), "index");
        var indexerMethod = typeof(IList<int>).GetMethod("get_Item");
        var lookupMethod = typeof(IDictionary<string, object>).GetMethod("get_Item");
        Func<string, Expression> getSymbolExpression = symbol => Expression.Call(Expression.Constant(symbols), lookupMethod, Expression.Constant(symbol));
        var addExpression = Expression.Add(
                                Expression.Call(Expression.Convert(getSymbolExpression("a"), typeof(IList<int>)), indexerMethod, indexExpression),
                                Expression.Call(Expression.Convert(getSymbolExpression("b"), typeof(IList<int>)), indexerMethod, indexExpression));
        var compiledFunc = Expression.Lambda<Func<int, object>>(Expression.Convert(addExpression, typeof(object)), indexExpression).Compile();

        sw.Restart();
        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += (int)compiledFunc(i);
            }
        }

        Console.WriteLine("Compiled Delegate: " + sw.Elapsed);



        sw.Restart();
        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += (int)calculate(i);
            }
        }

        Console.WriteLine("Scripted: " + sw.Elapsed);
        Console.WriteLine(sum); // make sure cannot be optimized away
    }

这篇关于性能的大规模评价表达式在IronPython的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆